| Age | Commit message (Collapse) | Author |
|
This completes the previous patches by moving notifier registration for
SF dev tables outside the devlink locked critical section in
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.
This is only done for non-SFs, since SFs do not have a SF HW table
themselves.
After this patch, notifiers can grab the PF devlink lock (soon to be
necessary) without creating a locking cycle.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the SF table notifiers registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.
This is only done for non-SFs, since SFs do not have a SF table
themselves and thus don't need notifiers.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the SF HW table notifier registration/unregistration outside of
mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() /
mlx5_mdev_uninit() functions.
This is only done for non-SFs, since SFs do not have a SF HW table
themselves.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The vhca event notifier consists of an atomic notifier for vhca state
changes (used for SF events), multiple workqueues and a blocking
notifier chain for delivering the vhca state change events for further
processing.
This patch moves the vhca notifier head outside of mlx5_init_one() /
mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit()
functions.
This allows called notifiers to grab the PF devlink lock which was
previously impossible because it would create a circular lock
dependency.
mlx5_vhca_event_stop() is now called earlier in the cleanup phase and
flushes the workqueues to ensure that after the call, there are no
pending events. This simplifies the cleanup flow for vhca event
consumers.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The esw mode change notifier chain is initialized/cleaned up in
mlx5_init_one() / mlx5_uninit_one() with the devlink lock held.
Move the notifier head from the eswitch struct into mlx5_priv directly,
and initialize it outside the critical section. This will allow notifier
registration to happen earlier in the init procedure in subsequent
patches.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763325940-1231508-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs)
per TX WQE. Exceeding this limit can cause TX failures.
Add ndo_features_check() callback to validate SKB layout before
transmission. For GSO SKBs that would exceed the hardware SGE limit, clear
NETIF_F_GSO_MASK to enforce software segmentation in the stack.
Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still
exceed the SGE limit.
Also, Add ethtool counter for SKBs linearized
Co-developed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In case of a FW issue, FW might be not responding to FW commands,
causing kernel lockout for a long period of time, e.g. rtnl_lock held
while ethtool is trying to collect stats waiting for FW to respond to
multiple commands, when all of them will timeout.
While there's no immediate indication of the FW lockout, we can safely
assume that something is wrong when all command slots are busy and in
a timeout state and no FW completion was received on any of them.
In such case, start immediately failing new commands.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The "phy-mode" property of devicetree indicates whether the PCB has
delay now, which means the mac needs to modify the PHY mode based
on whether there is an internal delay in the mac.
This modification is similar for many ethernet drivers. To simplify
code, define the helper phy_fix_phy_mode_for_mac_delays(speed, mac_txid,
mac_rxid) to fix PHY mode based on whether mac adds internal delay.
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251114003805.494387-3-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Report standard counter stats->rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.
Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.
To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.
Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S <interface>, maintaining consistency with
previous behavior.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-11-13
The following pull-request contains common mlx5 updates
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Expose definition for 1600Gbps link mode
net/mlx5: fs, set non default device per namespace
net/mlx5: fs, Add other_eswitch support for steering tables
net/mlx5: Add OTHER_ESWITCH HW capabilities
net/mlx5: Add direct ST mode support for RDMA
PCI/TPH: Expose pcie_tph_get_st_table_loc()
{rdma,net}/mlx5: Query vports mac address from device
====================
Link: https://patch.msgid.link/1763027252-1168760-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit bf40785fa437 ("sctp: Use HMAC-SHA1 and HMAC-SHA256 library for chunk
authentication") removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20251113114501.32905-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_gro_pull_header() is used in GRO fast path, inline it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251113140358.58242-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since 93e86b3bc842 ("net: dsa: Remove legacy probing support")
this struct has no user any longer.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/4053a98f-052f-4dc1-a3d4-ed9b3d3cc7cb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.18-rc6).
No conflicts, adjacent changes in:
drivers/net/phy/micrel.c
96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814")
and a trivial one in tools/testing/selftests/drivers/net/Makefile.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth and Wireless. No known outstanding
regressions.
Current release - regressions:
- eth:
- bonding: fix mii_status when slave is down
- mlx5e: fix missing error assignment in mlx5e_xfrm_add_state()
Previous releases - regressions:
- sched: limit try_bulk_dequeue_skb() batches
- ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe
- af_unix: initialise scc_index in unix_add_edge()
- netpoll: fix incorrect refcount handling causing incorrect cleanup
- bluetooth: don't hold spin lock over sleeping functions
- hsr: Fix supervision frame sending on HSRv0
- sctp: prevent possible shift out-of-bounds
- tipc: fix use-after-free in tipc_mon_reinit_self().
- dsa: tag_brcm: do not mark link local traffic as offloaded
- eth: virtio-net: fix incorrect flags recording in big mode
Previous releases - always broken:
- sched: initialize struct tc_ife to fix kernel-infoleak
- wifi:
- mac80211: reject address change while connecting
- iwlwifi: avoid toggling links due to wrong element use
- bluetooth: cancel mesh send timer when hdev removed
- strparser: fix signed/unsigned mismatch bug
- handshake: fix memory leak in tls_handshake_accept()
Misc:
- selftests: mptcp: fix some flaky tests"
* tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
hsr: Follow standard for HSRv0 supervision frames
hsr: Fix supervision frame sending on HSRv0
virtio-net: fix incorrect flags recording in big mode
ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe
wifi: iwlwifi: mld: always take beacon ies in link grading
wifi: iwlwifi: mvm: fix beacon template/fixed rate
wifi: iwlwifi: fix aux ROC time event iterator usage
net_sched: limit try_bulk_dequeue_skb() batches
selftests: mptcp: join: properly kill background tasks
selftests: mptcp: connect: trunc: read all recv data
selftests: mptcp: join: userspace: longer transfer
selftests: mptcp: join: endpoints: longer transfer
selftests: mptcp: join: rm: set backup flag
selftests: mptcp: connect: fix fallback note due to OoO
ethtool: fix incorrect kernel-doc style comment in ethtool.h
mlx5: Fix default values in create CQ
Bluetooth: btrtl: Avoid loading the config file on security chips
net/mlx5e: Fix potentially misleading debug message
net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps
net/mlx5e: Fix maxrate wraparound in threshold between units
...
|
|
The ->setup() method implemented by dwmac-loongson and dwmac-sun8i
allocate the mac_device_info structure, as does stmmac_hwif_init().
This makes no sense.
Have stmmac_hwif_init() always allocate this structure, and pass it to
the ->setup() method to initialise when it is provided. Rename this
method to "mac_setup" to more accurately describe what it is doing.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vImWK-0000000DrIx-28vO@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2025-11-12
this is a pull request of 11 patches for net-next/main.
The first 3 patches are by Vadim Fedorenko and convert the CAN drivers
to use the ndo_hwtstamp callbacks.
Maud Spierings contributes a patch for the mcp251x driver that
converts it to use dev_err_probe().
The next 6 patches target the mcp251xfd driver and are by Gregor
Herburger and me. They add GPIO controller functionality to the
driver.
The final patch is by Chu Guangqing and fixes a typo in the bxcan
driver.
linux-can-next-for-6.19-20251112-2
* tag 'linux-can-next-for-6.19-20251112-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: bxcan: Fix a typo error for assign
dt-bindings: can: mcp251xfd: add gpio-controller property
can: mcp251xfd: add gpio functionality
can: mcp251xfd: only configure PIN1 when rx_int is set
can: mcp251xfd: add workaround for errata 5
can: mcp251xfd: utilize gather_write function for all non-CRC writes
can: mcp251xfd: move chip sleep mode into runtime pm
can: mcp251x: mcp251x_can_probe(): use dev_err_probe()
can: peak_usb: convert to use ndo_hwtstamp callbacks
can: peak_canfd: convert to use ndo_hwtstamp callbacks
can: convert generic HW timestamp ioctl to ndo_hwtstamp callbacks
====================
Link: https://patch.msgid.link/20251112184344.189863-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
- two minor fixes for DMA API infrastructure: restoring proper
structure padding used in benchmark tests (Qinxin Xia) and global
DMA_BIT_MASK macro rework to make it a bit more clang friendly (James
Clark)
* tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope
dma-mapping: benchmark: Restore padding to ensure uABI remained consistent
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
More -next material, notably:
- split ieee80211.h file, it's way too big
- mac80211: initial chanctx work towards NAN
- mac80211: MU-MIMO sniffer improvements
- ath12k: statistics improvements
* tag 'wireless-next-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (26 commits)
wifi: cw1200: Fix potential memory leak in cw1200_bh_rx_helper()
wifi: mac80211: make monitor link info check more specific
wifi: mac80211: track MU-MIMO configuration on disabled interfaces
wifi: cfg80211/mac80211: Add fallback mechanism for INDOOR_SP connection
wifi: cfg80211/mac80211: clean up duplicate ap_power handling
wifi: cfg80211: use a C99 initializer in wiphy_register
wifi: cfg80211: fix doc of struct key_params
wifi: mac80211: remove unnecessary vlan NULL check
wifi: mac80211: pass frame type to element parsing
wifi: mac80211: remove "disabling VHT" message
wifi: mac80211: add and use chanctx usage iteration
wifi: mac80211: simplify ieee80211_recalc_chanctx_min_def() API
wifi: mac80211: remove chanctx to link back-references
wifi: mac80211: make link iteration safe for 'break'
wifi: mac80211: fix EHT typo
wifi: cfg80211: fix EHT typo
wifi: ieee80211: split NAN definitions out
wifi: ieee80211: split P2P definitions out
wifi: ieee80211: split S1G definitions out
wifi: ieee80211: split EHT definitions out
...
====================
Link: https://patch.msgid.link/20251112115126.16223-4-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch exposes new link mode for 1600Gbps, utilizing 8 lanes at
200Gbps per lane.
Co-developed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1762863888-1092798-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_conn: Fix not cleaning up PA_LINK connections
- hci_event: Fix not handling PA Sync Lost event
- MGMT: cancel mesh send timer when hdev removed
- 6lowpan: reset link-local header on ipv6 recv path
- 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion
- L2CAP: export l2cap_chan_hold for modules
- 6lowpan: Don't hold spin lock over sleeping functions
- 6lowpan: add missing l2cap_chan_lock()
- btusb: reorder cleanup in btusb_disconnect to avoid UAF
- btrtl: Avoid loading the config file on security chips
* tag 'for-net-2025-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btrtl: Avoid loading the config file on security chips
Bluetooth: hci_event: Fix not handling PA Sync Lost event
Bluetooth: hci_conn: Fix not cleaning up PA_LINK connections
Bluetooth: 6lowpan: add missing l2cap_chan_lock()
Bluetooth: 6lowpan: Don't hold spin lock over sleeping functions
Bluetooth: L2CAP: export l2cap_chan_hold for modules
Bluetooth: 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion
Bluetooth: 6lowpan: reset link-local header on ipv6 recv path
Bluetooth: btusb: reorder cleanup in btusb_disconnect to avoid UAF
Bluetooth: MGMT: cancel mesh send timer when hdev removed
====================
Link: https://patch.msgid.link/20251111141357.1983153-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Building documentation produced the following warning:
WARNING: ./include/linux/ethtool.h:495 This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* IEEE 802.3ck/df defines 16 bins for FEC histogram plus one more for
This comment was not intended to be parsed as kernel-doc, so replace
the '/**' with '/*' to silence the warning and align with normal
comment style in header files.
No functional changes.
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Link: https://patch.msgid.link/20251110182545.2112596-1-kriish.sharma2006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, CQs without a completion function are assigned the
mlx5_add_cq_to_tasklet function by default. This is problematic since
only user CQs created through the mlx5_ib driver are intended to use
this function.
Additionally, all CQs that will use doorbells instead of polling for
completions must call mlx5_cq_arm. However, the default CQ creation flow
leaves a valid value in the CQ's arm_db field, allowing FW to send
interrupts to polling-only CQs in certain corner cases.
These two factors would allow a polling-only kernel CQ to be triggered
by an EQ interrupt and call a completion function intended only for user
CQs, causing a null pointer exception.
Some areas in the driver have prevented this issue with one-off fixes
but did not address the root cause.
This patch fixes the described issue by adding defaults to the create CQ
flow. It adds a default dummy completion function to protect against
null pointer exceptions, and it sets an invalid command sequence number
by default in kernel CQs to prevent the FW from sending an interrupt to
the CQ until it is armed. User CQs are responsible for their own
initialization values.
Callers of mlx5_core_create_cq are responsible for changing the
completion function and arming the CQ per their needs.
Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Leon Romanovsky <leon@kernel.org>
Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This handles PA Sync Lost event which previously was assumed to be
handled with BIG Sync Lost but their lifetime are not the same thus why
there are 2 different events to inform when each sync is lost.
Fixes: b2a5f2e1c127 ("Bluetooth: hci_event: Add support for handling LE BIG Sync Lost event")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Add support for eswitch switchdev inactive mode
Inactive mode: Drop all traffic going to FDB, Remove
mpfs l2 rules and disconnect adjacent vports.
Active mode: Traffic flows through FDB, mpfs table populated, and
adjacent vports are connected.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Adithya Jayachandran <ajayachandra@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251108070404.1551708-4-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Adds DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE attribute to UAPI and
documentation.
Before having traffic flow through an eswitch, a user may want to have the
ability to block traffic towards the FDB until FDB is fully programmed and
the user is ready to send traffic to it. For example: when two eswitches
are present for vports in a multi-PF setup, one eswitch may take over the
traffic from the other when the user chooses.
Before this take over, a user may want to first program the inactive
eswitch and then once ready redirect traffic to this new eswitch.
switchdev modes transition semantics:
legacy->switchdev_inactive: Create switchdev mode normally, traffic not
allowed to flow yet.
switchdev_inactive->switchdev: Enable traffic to flow.
switchdev->switchdev_inactive: Block traffic on the FDB, FDB and
representros state and content is preserved.
When eswitch is configured to this mode, traffic is ignored/dropped on
this eswitch FDB, while current configuration is kept, e.g FDB rules and
netdev representros are kept available, FDB programming is allowed.
Example:
# start inactive switchdev
devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive
# setup TC rules, representors etc ..
# activate
devlink dev eswitch set pci/0000:08:00.1 mode switchdev
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20251108070404.1551708-2-saeed@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Implement fallback to LPI mode when SP mode is not permitted
by regulatory constraints for INDOOR_SP connections.
Limit fallback mechanism to client mode.
Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110140806.8b43201a34ae.I37fc7bb5892eb9d044d619802e8f2095fde6b296@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Move duplicated ap_power type handling code to an inline
function in cfg80211.
Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20251110140806.959948da1cb5.I893b5168329fb3232f249c182a35c99804112da6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since Eric proposed an idea about adding indirect call wrappers for
UDP and managed to see a huge improvement[1], the same situation can
also be applied in xsk scenario.
This patch adds an indirect call for xsk and helps current copy mode
improve the performance by around 1% stably which was observed with
IXGBE at 10Gb/sec loaded. If the throughput grows, the positive effect
will be magnified. I applied this patch on top of batch xmit series[2],
and was able to see <5% improvement from our internal application
which is a little bit unstable though.
Use INDIRECT wrappers to keep xsk_destruct_skb static as it used to
be when the mitigation config is off.
Be aware of the freeing path that can be very hot since the frequency
can reach around 2,000,000 times per second with the xdpsock test.
[1]: https://lore.kernel.org/netdev/20251006193103.2684156-2-edumazet@google.com/
[2]: https://lore.kernel.org/all/20251021131209.41491-1-kerneljasonxing@gmail.com/
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20251031103328.95468-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In the current implementation, usbnet uses a fixed tx_qlen of:
USB2: 60 * 1518 bytes = 91.08 KB
USB3: 60 * 5 * 1518 bytes = 454.80 KB
Such large transmit queues can be problematic, especially for cellular
modems. For example, with a typical celluar link speed of 10 Mbit/s, a
fully occupied USB3 transmit queue results in:
454.80 KB / (10 Mbit/s / 8 bit/byte) = 363.84 ms
of additional latency.
This patch adds support for Byte Queue Limits (BQL) [1] to dynamically
manage the transmit queue size and reduce latency without sacrificing
throughput.
Testing was performed on various devices using the usbnet driver for
packet transmission:
- DELOCK 66045: USB3 to 2.5 GbE adapter (ax88179_178a)
- DELOCK 61969: USB2 to 1 GbE adapter (asix)
- Quectel RM520: 5G modem (qmi_wwan)
- USB2 Android tethering (cdc_ncm)
No performance degradation was observed for iperf3 TCP or UDP traffic,
while latency for a prioritized ping application was significantly
reduced. For example, using the USB3 to 2.5 GbE adapter, which was fully
utilized by iperf3 UDP traffic, the prioritized ping was improved from
1.6 ms to 0.6 ms. With the same setup but with a 100 Mbit/s Ethernet
connection, the prioritized ping was improved from 35 ms to 5 ms.
[1] https://lwn.net/Articles/469652/
Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251106175615.26948-1-simon.schippers@tu-dortmund.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-11-10
We've added 19 non-merge commits during the last 3 day(s) which contain
a total of 22 files changed, 1345 insertions(+), 197 deletions(-).
The main changes are:
1) Preserve skb metadata after a TC BPF program has changed the skb,
from Jakub Sitnicki.
This allows a TC program at the end of a TC filter chain to still see
the skb metadata, even if another TC program at the front of the chain
has changed the skb using BPF helpers.
2) Initial af_smc bpf_struct_ops support to control the smc specific
syn/synack options, from D. Wythe.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf/selftests: Add selftest for bpf_smc_hs_ctrl
net/smc: bpf: Introduce generic hook for handshake flow
bpf: Export necessary symbols for modules with struct_ops
selftests/bpf: Cover skb metadata access after bpf_skb_change_proto
selftests/bpf: Cover skb metadata access after change_head/tail helper
selftests/bpf: Cover skb metadata access after bpf_skb_adjust_room
selftests/bpf: Cover skb metadata access after vlan push/pop helper
selftests/bpf: Expect unclone to preserve skb metadata
selftests/bpf: Dump skb metadata on verification failure
selftests/bpf: Verify skb metadata in BPF instead of userspace
bpf: Make bpf_skb_change_head helper metadata-safe
bpf: Make bpf_skb_change_proto helper metadata-safe
bpf: Make bpf_skb_adjust_room metadata-safe
bpf: Make bpf_skb_vlan_push helper metadata-safe
bpf: Make bpf_skb_vlan_pop helper metadata-safe
vlan: Make vlan_remove_tag return nothing
bpf: Unclone skb head on bpf_dynptr_write to skb metadata
net: Preserve metadata on pskb_expand_head
net: Helper to move packet data and metadata after skb_push/pull
====================
Link: https://patch.msgid.link/20251110232427.3929291-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported list_del(&sp->auto_asconf_list) corruption
in sctp_destroy_sock().
The repro calls setsockopt(SCTP_AUTO_ASCONF, 1) to a SCTP
listener, calls accept(), and close()s the child socket.
setsockopt(SCTP_AUTO_ASCONF, 1) sets sp->do_auto_asconf
to 1 and links sp->auto_asconf_list to a per-netns list.
Both fields are placed after sp->pd_lobby in struct sctp_sock,
and sctp_copy_descendant() did not copy the fields before the
cited commit.
Also, sctp_clone_sock() did not set them explicitly.
In addition, sctp_auto_asconf_init() is called from
sctp_sock_migrate(), but it initialises the fields only
conditionally.
The two fields relied on __GFP_ZERO added in sk_alloc(),
but sk_clone() does not use it.
Let's clear newsp->do_auto_asconf in sctp_clone_sock().
[0]:
list_del corruption. prev->next should be ffff8880799e9148, but was ffff8880799e8808. (prev=ffff88803347d9f8)
kernel BUG at lib/list_debug.c:64!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6008 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025
RIP: 0010:__list_del_entry_valid_or_report+0x15a/0x190 lib/list_debug.c:62
Code: e8 7b 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 7c ee 92 fd 49 8b 17 48 c7 c7 80 0a bf 8b 48 89 de 4c 89 f9 e8 07 c6 94 fc 90 <0f> 0b 4c 89 f7 e8 4c 26 71 fd 43 80 3c 2c 00 74 08 4c 89 ff e8 4d
RSP: 0018:ffffc90003067ad8 EFLAGS: 00010246
RAX: 000000000000006d RBX: ffff8880799e9148 RCX: b056988859ee6e00
RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffc90003067807 R09: 1ffff9200060cf00
R10: dffffc0000000000 R11: fffff5200060cf01 R12: 1ffff1100668fb3f
R13: dffffc0000000000 R14: ffff88803347d9f8 R15: ffff88803347d9f8
FS: 00005555823e5500(0000) GS:ffff88812613e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000480 CR3: 00000000741ce000 CR4: 00000000003526f0
Call Trace:
<TASK>
__list_del_entry_valid include/linux/list.h:132 [inline]
__list_del_entry include/linux/list.h:223 [inline]
list_del include/linux/list.h:237 [inline]
sctp_destroy_sock+0xb4/0x370 net/sctp/socket.c:5163
sk_common_release+0x75/0x310 net/core/sock.c:3961
sctp_close+0x77e/0x900 net/sctp/socket.c:1550
inet_release+0x144/0x190 net/ipv4/af_inet.c:437
__sock_release net/socket.c:662 [inline]
sock_close+0xc3/0x240 net/socket.c:1455
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 16942cf4d3e3 ("sctp: Use sk_clone() in sctp_accept().")
Reported-by: syzbot+ba535cb417f106327741@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/690d2185.a70a0220.22f260.000e.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20251106223418.1455510-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The introduction of IPPROTO_SMC enables eBPF programs to determine
whether to use SMC based on the context of socket creation, such as
network namespaces, PID and comm name, etc.
As a subsequent enhancement, to introduce a new generic hook that
allows decisions on whether to use SMC or not at runtime, including
but not limited to local/remote IP address or ports.
User can write their own implememtion via bpf_struct_ops now to choose
whether to use SMC or not before TCP 3rd handshake to be comleted.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20251107035632.115950-3-alibuda@linux.alibaba.com
|
|
Use the metadata-aware helper to move packet bytes after skb_push(),
ensuring metadata remains valid after calling the BPF helper.
Also, take care to reserve sufficient headroom for metadata to fit.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-6-5ceb08a9b37b@cloudflare.com
|
|
Use the metadata-aware helper to move packet bytes after skb_pull(),
ensuring metadata remains valid after calling the BPF helper.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-5-5ceb08a9b37b@cloudflare.com
|
|
All callers ignore the return value.
Prepare to reorder memmove() after skb_pull() which is a common pattern.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-4-5ceb08a9b37b@cloudflare.com
|
|
Currently bpf_dynptr_from_skb_meta() marks the dynptr as read-only when
the skb is cloned, preventing writes to metadata.
Remove this restriction and unclone the skb head on bpf_dynptr_write() to
metadata, now that the metadata is preserved during uncloning. This makes
metadata dynptr consistent with skb dynptr, allowing writes regardless of
whether the skb is cloned.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-3-5ceb08a9b37b@cloudflare.com
|
|
Lay groundwork for fixing BPF helpers available to TC(X) programs.
When skb_push() or skb_pull() is called in a TC(X) ingress BPF program, the
skb metadata must be kept in front of the MAC header. Otherwise, BPF
programs using the __sk_buff->data_meta pseudo-pointer lose access to it.
Introduce a helper that moves both metadata and a specified number of
packet data bytes together, suitable as a drop-in replacement for
memmove().
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251105-skb-meta-rx-path-v4-1-5ceb08a9b37b@cloudflare.com
|
|
The seq in struct key_params is for many ciphers, including CCMP, GCMP,
CMAC, GMAC. In addition to get_key(), it is also used when setting keys.
Signed-off-by: Chien Wong <m@xv97.com>
Link: https://patch.msgid.link/20251107142332.181308-1-m@xv97.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This will be needed for UHR operation parsing, and we
already pass whether or not the frame is an action
frame, replace that by the full type. Note this fixes
a few cases where 'false' was erroneously passed (mesh
and TDLS) and removes ieee802_11_parse_elems_crc() as
it's unused.
Link: https://patch.msgid.link/20251105160810.a476d20a6e01.Ie659535f9357f2f9a3c73f8c059ccfc96bf93b54@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is clearly EHT, not ETH, fix the typo.
Link: https://patch.msgid.link/20251105153958.12a04517f7ec.Idcf800817fa30605b1002c3d2287cad016e7aea7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is clearly EHT, not ETH, fix the typo.
Link: https://patch.msgid.link/20251105153958.e9d4af3b768e.I5f3378326837e3f62928a2f1fd3403f29cea069b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211.h file has gotten very long, continue splitting
it by putting NAN definitions into a separate file. Note that
NAN isn't really even IEEE 802.11 but WFA.
Link: https://patch.msgid.link/20251105153843.8da0e796dda2.I7b2ce11220b70e8794019501eabbf8afbaf431a6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211.h file has gotten very long, continue splitting
it by putting P2P definitions into a separate file. Note that
P2P isn't really even IEEE 802.11 but WFA.
Link: https://patch.msgid.link/20251105153843.e47b2614e9d2.Id242f61da720e365f6b5d7a4a545fbbc2f1e92b4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211.h file has gotten very long, continue splitting
it by putting S1G definitions into a separate file.
Link: https://patch.msgid.link/20251105153843.82c0bddee6e3.Ic6646615286dad240b42e31e9d428c7e4ea40ce0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211.h file has gotten very long, continue splitting
it by putting EHT definitions into a separate file.
Link: https://patch.msgid.link/20251105153843.bf77fe169140.I691267e0edd914c604a5bfd447d33be00044c9b4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211.h file has gotten very long, continue splitting
it by putting HE definitions into a separate file.
Link: https://patch.msgid.link/20251105153843.6998c0802104.I3dd7cfea6abbd118b999ecdedd48437d39cb0533@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The ieee80211.h file has gotten very long, continue splitting
it by putting VHT definitions into a separate file.
Link: https://patch.msgid.link/20251105153843.c31cb771a250.I787a13064db7d80440101de3445be17881daf1b6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|