summaryrefslogtreecommitdiff
path: root/Documentation/networking/devlink
AgeCommit message (Collapse)Author
2025-11-20net/mlx5: implement swp_l4_csum_mode via devlink paramsDaniel Zahka
swp_l4_csum_mode controls how L4 transmit checksums are computed when using Software Parser (SWP) hints for header locations. Supported values: 1. default: device will choose between full_csum or l4_only. Driver will discover the device's choice during initialization. 2. full_csum: calculate L4 checksum with the pseudo-header. 3. l4_only: calculate L4 checksum without the pseudo-header. Only available when swp_l4_csum_mode_l4_only is set in mlx5_ifc_nv_sw_offload_cap_bits. Note that 'default' might be returned from the device and passed to userspace, and it might also be set during a devlink_param::reset_default() call, but attempts to set a value of default directly with param-set will be rejected. The l4_only setting is a dependency for PSP initialization in mlx5e_psp_init(). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-5-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20devlink: support default values for param-get and param-setDaniel Zahka
Support querying and resetting to default param values. Introduce two new devlink netlink attrs: DEVLINK_ATTR_PARAM_VALUE_DEFAULT and DEVLINK_ATTR_PARAM_RESET_DEFAULT. The former is used to contain an optional parameter value inside of the param_value nested attribute. The latter is used in param-set requests from userspace to indicate that the driver should reset the param to its default value. To implement this, two new functions are added to the devlink driver api: devlink_param::get_default() and devlink_param::reset_default(). These callbacks allow drivers to implement default param actions for runtime and permanent cmodes. For driverinit params, the core latches the last value set by a driver via devl_param_driverinit_value_set(), and uses that as the default value for a param. Because default parameter values are optional, it would be impossible to discern whether or not a param of type bool has default value of false or not provided if the default value is encoded using a netlink flag type. For this reason, when a DEVLINK_PARAM_TYPE_BOOL has an associated default value, the default value is encoded using a u8 type. Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20251119025038.651131-4-daniel.zahka@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11devlink: Introduce switchdev_inactive eswitch modeSaeed Mahameed
Adds DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE attribute to UAPI and documentation. Before having traffic flow through an eswitch, a user may want to have the ability to block traffic towards the FDB until FDB is fully programmed and the user is ready to send traffic to it. For example: when two eswitches are present for vports in a multi-PF setup, one eswitch may take over the traffic from the other when the user chooses. Before this take over, a user may want to first program the inactive eswitch and then once ready redirect traffic to this new eswitch. switchdev modes transition semantics: legacy->switchdev_inactive: Create switchdev mode normally, traffic not allowed to flow yet. switchdev_inactive->switchdev: Enable traffic to flow. switchdev->switchdev_inactive: Block traffic on the FDB, FDB and representros state and content is preserved. When eswitch is configured to this mode, traffic is ignored/dropped on this eswitch FDB, while current configuration is kept, e.g FDB rules and netdev representros are kept available, FDB programming is allowed. Example: # start inactive switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev_inactive # setup TC rules, representors etc .. # activate devlink dev eswitch set pci/0000:08:00.1 mode switchdev Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251108070404.1551708-2-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-06i40e: support generic devlink param "max_mac_per_vf"Mohammad Heib
Currently the i40e driver enforces its own internally calculated per-VF MAC filter limit, derived from the number of allocated VFs and available hardware resources. This limit is not configurable by the administrator, which makes it difficult to control how many MAC addresses each VF may use. This patch adds support for the new generic devlink runtime parameter "max_mac_per_vf" which provides administrators with a way to cap the number of MAC addresses a VF can use: - When the parameter is set to 0 (default), the driver continues to use its internally calculated limit. - When set to a non-zero value, the driver applies this value as a strict cap for VFs, overriding the internal calculation. Important notes: - The configured value is a theoretical maximum. Hardware limits may still prevent additional MAC addresses from being added, even if the parameter allows it. - Since MAC filters are a shared hardware resource across all VFs, setting a high value may cause resource contention and starve other VFs. - This change gives administrators predictable and flexible control over VF resource allocation, while still respecting hardware limitations. - Previous discussion about this change: https://lore.kernel.org/netdev/20250805134042.2604897-2-dhill@redhat.com https://lore.kernel.org/netdev/20250823094952.182181-1-mheib@redhat.com Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-11-06devlink: Add new "max_mac_per_vf" generic device paramMohammad Heib
Add a new device generic parameter to controls the maximum number of MAC filters allowed per VF. For example, to limit a VF to 3 MAC addresses: $ devlink dev param set pci/0000:3b:00.0 name max_mac_per_vf \ value 3 \ cmode runtime Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-11-03net: stmmac: rename devlink parameter ts_coarse into phc_coarse_adjMaxime Chevallier
The devlink param "ts_coarse" doesn't indicate that we get coarse timestamps, but rather that the PHC clock adjusments are coarse as the frequency won't be continuously adjusted. Adjust the devlink parameter name to reflect that. The Coarse terminlogy comes from the dwmac register naming, update the documentation to better explain what the parameter is about. With this change, the parameter can now be adjusted using: devlink dev param set <dev> name phc_coarse_adj value true cmode runtime Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251030182454.182406-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-28net: stmmac: Add a devlink attribute to control timestamping modeMaxime Chevallier
The DWMAC1000 supports 2 timestamping configurations to configure how frequency adjustments are made to the ptp_clock, as well as the reported timestamp values. There was a previous attempt at upstreaming support for configuring this mode by Olivier Dautricourt and Julien Beraud a few years back [1] In a nutshell, the timestamping can be either set in fine mode or in coarse mode. In fine mode, which is the default, we use the overflow of an accumulator to trigger frequency adjustments, but by doing so we lose precision on the timetamps that are produced by the timestamping unit. The main drawback is that the sub-second increment value, used to generate timestamps, can't be set to lower than (2 / ptp_clock_freq). The "fine" qualification comes from the frequent frequency adjustments we are able to do, which is perfect for a PTP follower usecase. In Coarse mode, we don't do frequency adjustments based on an accumulator overflow. We can therefore have very fine subsecond increment values, allowing for better timestamping precision. However this mode works best when the ptp clock frequency is adjusted based on an external signal, such as a PPS input produced by a GPS clock. This mode is therefore perfect for a Grand-master usecase. Introduce a driver-specific devlink parameter "ts_coarse" to enable or disable coarse mode, keeping the "fine" mode as a default. This can then be changed with: devlink dev param set <dev> name ts_coarse value true cmode runtime The associated documentation is also added. [1] : https://lore.kernel.org/netdev/20200514102808.31163-1-olivier.dautricourt@orolia.com/ Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20251024070720.71174-3-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-27net/mlx5: Expose uar access and odp page fault countersAkiva Goldberger
Add three counters to vnic health reporter: bar_uar_access, odp_local_triggered_page_fault, and odp_remote_triggered_page_fault. - bar_uar_access number of WRITE or READ access operations to the UAR on the PCIe BAR. - odp_local_triggered_page_fault number of locally-triggered page-faults due to ODP. - odp_remote_triggered_page_fault number of remotly-triggered page-faults due to ODP. Example access: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 icm_consumption: 1032 bar_uar_access: 1279 odp_local_triggered_page_fault: 20 odp_remote_triggered_page_fault: 34 Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1758797130-829564-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17net/mlx5e: Use the 'num_doorbells' devlink paramCosmin Ratiu
Use the new devlink param to control how many doorbells mlx5e devices allocate and use. The maximum number of doorbells configurable is capped to the maximum number of channels. This only applies to the Ethernet part, the RDMA devices using mlx5 manage their own doorbells. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-17devlink: Add a 'num_doorbells' driverinit paramCosmin Ratiu
This parameter can be used by drivers to configure a different number of doorbells. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-16docs: devlink: Sort table of contents alphabeticallyKory Maincent (Dent Project)
Sort devlink documentation table of contents alphabetically to improve readability and make it easier to locate specific chapters. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250915-feature_poe_permanent_conf-v3-3-78871151088b@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15dpll: zl3073x: Implement devlink flash callbackIvan Vecera
Use the introduced functionality to read firmware files and flash their contents into the device's internal flash memory to implement the devlink flash update callback. Sample output on EDS2 development board: # devlink -j dev info i2c/1-0070 | jq '.[][]["versions"]["running"]' { "fw": "6026" } # devlink dev flash i2c/1-0070 file firmware_fw2.hex [utility] Prepare flash mode [utility] Downloading image 100% [utility] Flash mode enabled [firmware1-part1] Downloading image 100% [firmware1-part1] Flashing image [firmware1-part2] Downloading image 100% [firmware1-part2] Flashing image [firmware1] Flashing done [firmware2] Downloading image 100% [firmware2] Flashing image 100% [firmware2] Flashing done [utility] Leaving flash mode Flashing done # devlink -j dev info i2c/1-0070 | jq '.[][]["versions"]["running"]' { "fw": "7006" } Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20250909091532.11790-6-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09net/mlx5e: Make PCIe congestion event thresholds configurableDragos Tatulea
Add devlink driverinit parameters for configuring the thresholds for PCIe congestion events. These parameters are registered only when the firmware supports this feature. Update the mlx5 devlink docs as well on these new params. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1757237976-531416-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09net/mlx5: Implement devlink total_vfs parameterVlad Dumitrescu
Some devices support both symmetric (same value for all PFs) and asymmetric, while others only support symmetric configuration. This implementation prefers asymmetric, since it is closer to the devlink model (per function settings), but falls back to symmetric when needed. Example usage: devlink dev param set pci/0000:01:00.0 name total_vfs value <u16> cmode permanent devlink dev reload pci/0000:01:00.0 action fw_activate echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove echo 1 >/sys/bus/pci/rescan cat /sys/bus/pci/devices/0000:01:00.0/sriov_totalvfs Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Kamal Heib <kheib@redhat.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250907012953.301746-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09net/mlx5: Implement devlink enable_sriov parameterVlad Dumitrescu
Example usage: devlink dev param set pci/0000:01:00.0 name enable_sriov value {true, false} cmode permanent devlink dev reload pci/0000:01:00.0 action fw_activate echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove echo 1 >/sys/bus/pci/rescan grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_* Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Tested-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250907012953.301746-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09net/mlx5: Implement cqe_compress_type via devlink paramsSaeed Mahameed
Selects which algorithm should be used by the NIC in order to decide rate of CQE compression dependeng on PCIe bus conditions. Supported values: 1) balanced, merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance 2) aggressive, merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250907012953.301746-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-09devlink: Add 'total_vfs' generic device paramVlad Dumitrescu
NICs are typically configured with total_vfs=0, forcing users to rely on external tools to enable SR-IOV (a widely used and essential feature). Add total_vfs parameter to devlink for SR-IOV max VF configurability. Enables standard kernel tools to manage SR-IOV, addressing the need for flexible VF configuration. Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Tested-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250907012953.301746-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-26devlink: Make health reporter burst period configurableShahar Shitrit
Enable configuration of the burst period — a time window starting from the first error recovery, during which the reporter allows recovery attempts for each reported error. This feature is helpful when a single underlying issue causes multiple errors, as it delays the start of the grace period to allow sufficient time for recovering all related errors. For example, if multiple TX queues time out simultaneously, a sufficient burst period could allow all affected TX queues to be recovered within that window. Without this period, only the first TX queue that reports a timeout will undergo recovery, while the remaining TX queues will be blocked once the grace period begins. Configuration example: $ devlink health set pci/0000:00:09.0 reporter tx burst_period 500 Configuration example with ynl: ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/devlink.yaml \ --do health-reporter-set --json '{ "bus-name": "auxiliary", "dev-name": "mlx5_core.eth.0", "port-index": 65535, "health-reporter-name": "tx", "health-reporter-burst-period": 500 }' Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250824084354.533182-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25Documentation: devlink: add devlink documentation for the kvaser_usb driverJimmy Assarsson
List the version information reported by the kvaser_usb driver through devlink. Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-12-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25Documentation: devlink: add devlink documentation for the kvaser_pciefd driverJimmy Assarsson
List the version information reported by the kvaser_pciefd driver through devlink. Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-11-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-09dpll: Add basic Microchip ZL3073x supportIvan Vecera
Microchip Azurite ZL3073x represents chip family providing DPLL and optionally PHC (PTP) functionality. The chips can be connected be connected over I2C or SPI bus. They have the following characteristics: * up to 5 separate DPLL units (channels) * 5 synthesizers * 10 input pins (references) * 10 outputs * 20 output pins (output pin pair shares one output) * Each reference and output can operate in either differential or single-ended mode (differential mode uses 2 pins) * Each output is connected to one of the synthesizers * Each synthesizer is driven by one of the DPLL unit The device uses 7-bit addresses and 8-bits values. It exposes 8-, 16-, 32- and 48-bits registers in address range <0x000,0x77F>. Due to 7bit addressing, the range is organized into pages of 128 bytes, with each page containing a page selector register at address 0x7F. For reading/writing multi-byte registers, the device supports bulk transfers. Add basic functionality to access device registers, probe functionality both I2C and SPI cases and add devlink support to provide info and to set clock ID parameter. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-6-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09devlink: Add new "clock_id" generic device paramIvan Vecera
Add a new device generic parameter to specify clock ID that should be used by the device for registering DPLL devices and pins. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250704182202.1641943-5-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-02devlink: Extend devlink rate API with traffic classes bandwidth managementCarolina Jubran
Introduce support for specifying relative bandwidth shares between traffic classes (TC) in the devlink-rate API. This new option allows users to allocate bandwidth across multiple traffic classes in a single command. This feature provides a more granular control over traffic management, especially for scenarios requiring Enhanced Transmission Selection. Users can now define a relative bandwidth share for each traffic class. For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5 (RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the total bandwidth. The actual percentage each class receives depends on the ratio of its share value to the sum of all shares. Example: DEV=pci/0000:08:00.0 $ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \ tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0 $ devlink port function rate set $DEV/vfs_group \ tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0 Example usage with ynl: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"rate-tc-index": 0, "rate-tc-bw": 50}, {"rate-tc-index": 1, "rate-tc-bw": 50}, {"rate-tc-index": 2, "rate-tc-bw": 0}, {"rate-tc-index": 3, "rate-tc-bw": 0}, {"rate-tc-index": 4, "rate-tc-bw": 0}, {"rate-tc-index": 5, "rate-tc-bw": 0}, {"rate-tc-index": 6, "rate-tc-bw": 0}, {"rate-tc-index": 7, "rate-tc-bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0}, {'rate-tc-bw': 50, 'rate-tc-index': 1}, {'rate-tc-bw': 0, 'rate-tc-index': 2}, {'rate-tc-bw': 0, 'rate-tc-index': 3}, {'rate-tc-bw': 0, 'rate-tc-index': 4}, {'rate-tc-bw': 0, 'rate-tc-index': 5}, {'rate-tc-bw': 0, 'rate-tc-index': 6}, {'rate-tc-bw': 0, 'rate-tc-index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01docs: netdevsim: fixe typo in netdevsim documentationDave Marquardt
Fixed a typographical error in "Rate objects" section Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com> Link: https://patch.msgid.link/20250630-netdevsim-typo-fix-v3-1-e1eae3a5f018@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18devlink: Add new "enable_phc" generic device paramDavid Arinzon
Add a new device generic parameter to enable/disable the PHC (PTP Hardware Clock) functionality in the device associated with the devlink instance. Signed-off-by: David Arinzon <darinzon@amazon.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250617110545.5659-6-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14documentation: networking: devlink: Fix a typo in devlink-trap.rstAlper Ak
Fix a typo in the documentation: "errorrs" -> "errors". Signed-off-by: Alper Ak <alperyasinak1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250513092451.22387-1-alperyasinak1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29ixgbe: devlink: add devlink region support for E610Slawomir Mrozowicz
Provide support for the following devlink cmds: -DEVLINK_CMD_REGION_GET -DEVLINK_CMD_REGION_NEW -DEVLINK_CMD_REGION_DEL -DEVLINK_CMD_REGION_READ ixgbe devlink region implementation, similarly to the ice one, lets user to create snapshots of content of Non Volatile Memory, content of Shadow RAM, and capabilities of the device. For both NVM and SRAM regions provide .read() handler to let user read their contents without the need to create full snapshots. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15ixgbe: add support for devlink reloadJedrzej Jagielski
The E610 adapters contain an embedded chip with firmware which can be updated using devlink flash. The firmware which runs on this chip is referred to as the Embedded Management Processor firmware (EMP firmware). Activating the new firmware image currently requires that the system be rebooted. This is not ideal as rebooting the system can cause unwanted downtime. The EMP firmware itself can be reloaded by issuing a special update to the device called an Embedded Management Processor reset (EMP reset). This reset causes the device to reset and reload the EMP firmware. Implement support for devlink reload with the "fw_activate" flag. This allows user space to request the firmware be activated immediately. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Co-developed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Co-developed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com> Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15ixgbe: add device flash update via devlinkJedrzej Jagielski
Use the pldmfw library to implement device flash update for the Intel ixgbe networking device driver specifically for E610 devices. This support uses the devlink flash update interface. Using the pldmfw library, the provided firmware file will be scanned for the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for the main NVM module containing the primary device firmware, and "fw.netlist" containing the netlist module. The flash is separated into two banks, the active bank containing the running firmware, and the inactive bank which we use for update. Each module is updated in a staged process. First, the inactive bank is erased, preparing the device for update. Second, the contents of the component are copied to the inactive portion of the flash. After all components are updated, the driver signals the device to switch the active bank during the next EMP reset. With this implementation, basic flash update for the E610 hardware is supported. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Co-developed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Co-developed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Co-developed-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com> Signed-off-by: Stefan Wegrzyn <stefan.wegrzyn@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15ixgbe: add .info_get extension specific for E610 devicesJedrzej Jagielski
E610 devices give possibility to show more detailed info than the previous boards. Extend reporting NVM info with following pieces: fw.mgmt.api -> version number of the API fw.mgmt.build -> identifier of the source for the FW fw.mgmt.srev -> number defining FW's security revision fw.psid.api -> version defining the format of the flash contents fw.undi.srev -> number defining OROM's security revision fw.netlist -> version of the netlist module fw.netlist.build -> first 4 bytes of the netlist hash Co-developed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Co-developed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15ixgbe: add handler for devlink .info_get()Jedrzej Jagielski
Provide devlink .info_get() callback implementation to allow the driver to report detailed version information. The following info is reported: "serial_number" -> The PCI DSN of the adapter "fw.bundle_id" -> Unique identifier for the combined flash image "fw.undi" -> Version of the Option ROM containing the UEFI driver "board.id" -> The PBA ID string Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-15ixgbe: add initial devlink supportJedrzej Jagielski
Add an initial support for devlink interface to ixgbe driver. Similarly to i40e driver the implementation doesn't enable devlink to manage device-wide configuration. Devlink instance is created for each physical function of PCIe device. Create separate directory for devlink related ixgbe files and use naming scheme similar to the one used in the ice driver. Add a stub for Documentation, to be extended by further patches. Change struct ixgbe_adapter allocation to be done by devlink (Przemek), as suggested by Jiri. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Co-developed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18bnxt_en: Add devlink support for ENABLE_ROCE nvm parameterPavan Chebbi
Add set/show support for the ENABLE_ROCE NVM parameter to enable/disable RoCE for a PF. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-4-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-12net/mlx5: Expose ICM consumption per functionAkiva Goldberger
ICM is a portion of the host's memory assigned to a function by the OS through requests made by the NIC's firmware. PF ICM consumption can be accessed directly, while VF/SF ICM consumption can be accessed through their representors in switchdev mode. The value is exposed to the user in granularity of 4KB through the vnic health reporter as follows: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 icm_consumption: 1032 Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11sfc: document devlink flash supportEdward Cree
Update the information in sfc's devlink documentation including support for firmware update with devlink flash. Also update the help text for CONFIG_SFC_MTD, as it is no longer strictly required for firmware updates. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05ice: devlink PF MSI-X max and min parameterMichal Swiatkowski
Use generic devlink PF MSI-X parameter to allow user to change MSI-X range. Add notes about this parameters into ice devlink documentation. Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-13net/mlx5: fs, add HWS to steering mode optionsMoshe Shemesh
Add HW Steering mode to mlx5 devlink param of steering mode options. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-14-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11devlink: Add documentation for OcteonTx2 AFLinu Cherian
Add documentation for the following devlink params - npc_mcam_high_zone_percent - npc_def_rule_cntr - nix_maxlf Signed-off-by: Linu Cherian <lcherian@marvell.com> Link: https://patch.msgid.link/20241105125620.2114301-4-lcherian@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-07-11 (net/intel) This series contains updates to most Intel network drivers. Tony removes MODULE_AUTHOR from drivers containing the entry. Simon Horman corrects a kdoc entry for i40e. Pawel adds implementation for devlink param "local_forwarding" on ice. Michal removes unneeded call, and code, for eswitch rebuild for ice. Sasha removed a no longer used field from igc. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove the internal 'eee_advert' field ice: remove eswitch rebuild ice: Add support for devlink local_forwarding param i40e: correct i40e_addr_to_hkey() name in kdoc net: intel: Remove MODULE_AUTHORs ==================== Link: https://patch.msgid.link/20240711201932.2019925-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c 26488172b029 ("net/sched: Fix UAF when resolving a clash") 3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11ice: Add support for devlink local_forwarding paramPawel Kaminski
Add support for driver-specific devlink local_forwarding param. Supported values are "enabled", "disabled" and "prioritized". Default configuration is set to "enabled". Add documentation in networking/devlink/ice.rst. In previous generations of Intel NICs the transmit scheduler was only limited by PCIe bandwidth when scheduling/assigning hairpin-bandwidth between VFs. Changes to E810 HW design introduced scheduler limitation, so that available hairpin-bandwidth is bound to external port speed. In order to address this limitation and enable NFV services such as "service chaining" a knob to adjust the scheduler config was created. Driver can send a configuration message to the FW over admin queue and internal FW logic will reconfigure HW to prioritize and add more BW to VF to VF traffic. An end result, for example, 10G port will no longer limit hairpin-bandwidth to 10G and much higher speeds can be achieved. Devlink local_forwarding param set to "prioritized" enables higher hairpin-bandwitdh on related PFs. Configuration is applicable only to 8x10G and 4x25G cards. Changing local_forwarding configuration will trigger CORER reset in order to take effect. Example command to change current value: devlink dev param set pci/0000:b2:00.3 name local_forwarding \ value prioritized \ cmode runtime Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-08docs: networking: devlink: capitalise length valueChris Packham
Correct the example to match the help text from the devlink utility. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21octeontx2-pf: Add ucast filter count configurability via devlink.Sai Krishna
The existing method of reserving unicast filter count leads to wasted MCAM entries if the functionality is not used or fewer entries are used. Furthermore, the amount of MCAM entries differs amongst Octeon SoCs. We implemented a means to adjust the UC filter count via devlink, allowing for better use of MCAM entries across Netdev apps. commands: To get the current unicast filter count # devlink dev param show pci/0002:02:00.0 name unicast_filter_count To change/set the unicast filter count # devlink dev param set pci/0002:02:00.0 name unicast_filter_count value 5 cmode runtime Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22ice: Document tx_scheduling_layers parameterMichal Wilczynski
New driver specific parameter 'tx_scheduling_layers' was introduced. Describe parameter in the documentation. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-12net: hns3: add support to query scc version by devlink infoHao Chen
Add support to query scc version by devlink info for device V3. Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://lore.kernel.org/r/20240410125354.2177067-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-12nfp: update devlink device info outputFei Qin
Newer NIC will introduce a new part number, now add it into devlink device info. This patch also updates the information of "board.id" in nfp.rst to match the devlink-info.rst. Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-12devlink: add a new info version tagFei Qin
Add definition and documentation for the new generic info "board.part_number". The new one is for part number specific use, and board.id is modified to match the documentation in devlink-info. Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08devlink: Support setting max_io_eqsParav Pandit
Many devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-28Documentation: Add documentation for eswitch attributeWilliam Tu
Provide devlink documentation for three eswitch attributes: mode, inline-mode, and encap-mode. Signed-off-by: William Tu <witu@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240325181228.6244-1-witu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>