summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
9 daysKVM: arm64: Add REQUIRES_E2H1 constraint as configuration flagsMarc Zyngier
A bunch of EL2 configuration are very similar to their EL1 counterpart, with the added constraint that HCR_EL2.E2H being 1. For us, this means HCR_EL2.E2H being RES1, which is something we can statically evaluate. Add a REQUIRES_E2H1 constraint, which allows us to express conditions in a much simpler way (without extra code). Existing occurrences are converted, before we add a lot more. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Simplify FIXED_VALUE handlingMarc Zyngier
The FIXED_VALUE qualifier (mostly used for HCR_EL2) is pointlessly complicated, as it tries to piggy-back on the previous RES0 handling while being done in a different phase, on different data. Instead, make it an integral part of the RESx computation, and allow it to directly set RESx bits. This is much easier to understand. It also paves the way for some additional changes to that will allow the full removal of the FIXED_VALUE handling. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Convert HCR_EL2.RW to AS_RES1Marc Zyngier
Now that we have the AS_RES1 constraint, it becomes trivial to express the HCR_EL2.RW behaviour. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported featuresMarc Zyngier
A bunch of SCTLR_EL1 bits must be set to RES1 when the controlling feature is not present. Add the AS_RES1 qualifier where needed. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Allow RES1 bits to be inferred from configurationMarc Zyngier
So far, when a bit field is tied to an unsupported feature, we set it as RES0. This is almost correct, but there are a few exceptions where the bits become RES1. Add a AS_RES1 qualifier that instruct the RESx computing code to simply do that. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Inherit RESx bits from FGT register descriptorsMarc Zyngier
The FGT registers have their computed RESx bits stashed in specific descriptors, which we can easily use when computing the masks used for the guest. This removes a bit of boilerplate code. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Extend unified RESx handling to runtime sanitisationMarc Zyngier
Add a new helper to retrieve the RESx values for a given system register, and use it for the runtime sanitisation. This results in slightly better code generation for a fairly hot path in the hypervisor, and additionally covers all sanitised registers in all conditions, not just the VNCR-based ones. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Introduce data structure tracking both RES0 and RES1 bitsMarc Zyngier
We have so far mostly tracked RES0 bits, but only made a few attempts at being just as strict for RES1 bits (probably because they are both rarer and harder to handle). Start scratching the surface by introducing a data structure tracking RES0 and RES1 bits at the same time. Note that contrary to the usual idiom, this structure is mostly passed around by value -- the ABI handles it nicely, and the resulting code is much nicer. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Introduce standalone FGU computing primitiveMarc Zyngier
Computing the FGU bits is made oddly complicated, as we use the RES0 helper instead of using a specific abstraction. Introduce such an abstraction, which is going to make things significantly simpler in the future. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysKVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E}Marc Zyngier
We already have specific constraints for SCTLR_EL1.{EE,E0E}, and making them depend on FEAT_AA64EL1 is just buggy. Fixes: 6bd4a274b026e ("KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation") Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysarm64: Convert SCTLR_EL2 to sysreg infrastructureMarc Zyngier
Convert SCTLR_EL2 to the sysreg infrastructure, as per the 2025-12_rel revision of the Registers.json file. Note that we slightly deviate from the above, as we stick to the ARM ARM M.a definition of SCTLR_EL2[9], which is RES0, in order to avoid dragging the POE2 definitions... Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
9 daysdrm/xe/guc: Fix CFI violation in debugfs access.Daniele Ceraolo Spurio
xe_guc_print_info is void-returning, but the function pointer it is assigned to expects an int-returning function, leading to the following CFI error: [ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe] (target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a) Fix this by updating xe_guc_print_info to return an integer. Fixes: e15826bb3c2c ("drm/xe/guc: Refactor GuC debugfs initialization") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: George D Sworo <george.d.sworo@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com (cherry picked from commit dd8ea2f2ab71b98887fdc426b0651dbb1d1ea760) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
9 dayslocking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCKMarco Elver
Move _raw_write_trylock_irqsave() after the _raw_write_trylock macro to ensure it uses the inlined version, fixing a linker error when inlining is enabled. This is the case on s390: >> ld.lld: error: undefined symbol: _raw_write_trylock >>> referenced by rwlock_api_smp.h:48 (include/linux/rwlock_api_smp.h:48) >>> lib/test_context-analysis.o:(test_write_trylock_extra) in archive vmlinux.a >>> referenced by rwlock_api_smp.h:48 (include/linux/rwlock_api_smp.h:48) >>> lib/test_context-analysis.o:(test_write_trylock_extra) in archive vmlinux.a Closes: https://lore.kernel.org/oe-kbuild-all/202602032101.dbxRfsWO-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260203225114.3493538-1-elver@google.com
9 dayss390: remove kvm_types.h from KbuildRandy Dunlap
kvm_types.h is mandatory in include/asm-generic/Kbuild so having it in another Kbuild file causes a warning. Remove it from the arch/ Kbuild file to fix the warning. ../scripts/Makefile.asm-headers:39: redundant generic-y found in ../arch/s390/include/asm/Kbuild: kvm_types.h Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260203184204.1329414-1-rdunlap@infradead.org
9 daysdrm/bridge: imx8mp-hdmi-pai: enable PM runtimeShengjiu Wang
There is an audio channel shift issue with multi channel case - the channel order is correct for the first run, but the channel order is shifted for the second run. The fix method is to reset the PAI interface at the end of playback. The reset can be handled by PM runtime, so enable PM runtime. Fixes: 0205fae6327a ("drm/bridge: imx: add driver for HDMI TX Parallel Audio Interface") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Reviewed-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://lore.kernel.org/r/20260130080910.3532724-1-shengjiu.wang@nxp.com
9 daysALSA: hda/realtek: Enable headset mic for Acer Nitro 5Breno Baptista
Add quirk to support microphone input through headphone jack on Acer Nitro 5 AN515-57 (ALC295). Signed-off-by: Breno Baptista <brenomb07@gmail.com> Link: https://patch.msgid.link/20260205024341.26694-1-brenomb07@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 daysnetfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()Andrew Fasano
nft_map_catchall_activate() has an inverted element activity check compared to its non-catchall counterpart nft_mapelem_activate() and compared to what is logically required. nft_map_catchall_activate() is called from the abort path to re-activate catchall map elements that were deactivated during a failed transaction. It should skip elements that are already active (they don't need re-activation) and process elements that are inactive (they need to be restored). Instead, the current code does the opposite: it skips inactive elements and processes active ones. Compare the non-catchall activate callback, which is correct: nft_mapelem_activate(): if (nft_set_elem_active(ext, iter->genmask)) return 0; /* skip active, process inactive */ With the buggy catchall version: nft_map_catchall_activate(): if (!nft_set_elem_active(ext, genmask)) continue; /* skip inactive, process active */ The consequence is that when a DELSET operation is aborted, nft_setelem_data_activate() is never called for the catchall element. For NFT_GOTO verdict elements, this means nft_data_hold() is never called to restore the chain->use reference count. Each abort cycle permanently decrements chain->use. Once chain->use reaches zero, DELCHAIN succeeds and frees the chain while catchall verdict elements still reference it, resulting in a use-after-free. This is exploitable for local privilege escalation from an unprivileged user via user namespaces + nftables on distributions that enable CONFIG_USER_NS and CONFIG_NF_TABLES. Fix by removing the negation so the check matches nft_mapelem_activate(): skip active elements, process inactive ones. Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase") Signed-off-by: Andrew Fasano <andrew.fasano@nist.gov> Signed-off-by: Florian Westphal <fw@strlen.de>
9 daysnet/mlx5e: Extend TC max ratelimit using max_bw_value_msbAlexei Lazar
The per-TC rate limit was restricted to 255 Gbps due to the 8-bit max_bw_value field in the QETC register. This limit is insufficient for newer, higher-bandwidth NICs. Extend the rate limit by using the full 16-bit max_bw_value field. This allows the finer 100Mbps granularity to be used for rates up to ~6.5 Tbps, instead of switching to 1Gbps granularity at higher rates. The extended range is only used when the device advertises support via the qetcr_qshr_max_bw_val_msb capability bit in the QCAM register. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260203073021.1710806-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge branch 'net-mlx5e-rx-datapath-enhancements'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5e: RX datapath enhancements This series by Dragos introduces multiple RX datapath enhancements to the mlx5e driver. First patch adds SW handling for oversized packets in non-linear SKB mode. Second patch adds a reclaim mechanism to mitigate memory allocation failures with memory providers. ==================== Link: https://patch.msgid.link/20260203072130.1710255-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5e: SHAMPO, Improve allocation recoveryDragos Tatulea
When memory providers are used, there is a disconnect between the page_pool size and the available memory in the provider. This means that the page_pool can run out of memory if the user didn't provision a large enough buffer. Under these conditions, mlx5 gets stuck trying to allocate new buffers without being able to release existing buffers. This happens due to the optimization introduced in commit 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling") which delays WQE releases to increase the chance of page_pool direct recycling. The optimization was developed before memory providers existed and this circumstance was not considered. This patch unblocks the queue by reclaiming pages from WQEs that can be freed and doing a one-shot retry. A WQE can be freed when: 1) All its strides have been consumed (WQE is no longer in linked list). 2) The WQE pages/netmems have not been previously released. This reclaim mechanism is useful for regular pages as well. Note that provisioning memory that can't fill even one MPWQE (64 4K pages) will still render the queue unusable. Same when the application doesn't release its buffers for various reasons. Or a combination of the two: a very small buffer is provisioned, application releases buffers in bulk, bulk size never reached => queue is stuck. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260203072130.1710255-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5e: RX, Drop oversized packets in non-linear modeDragos Tatulea
Currently the driver has an inconsistent behaviour between modes when it comes to oversized packets that are not dropped through the physical MTU check in HW. This can happen for Multi Host configurations where each port has a different MTU. Current behavior: 1) Striding RQ in linear mode drops the packet in SW and counts it with oversize_pkts_sw_drop. 2) Striding RQ in non-linear mode allows it like a normal packet. 3) Legacy RQ can't receive oversized packets by design: the RX WQE uses MTU sized packet buffers. This inconsistency is not a violation of the netdev policy [1] but it is better to be consistent across modes. This patch aligns (2) with (1) and (3). One exception is added for LRO: don't drop the oversized packet if it is an LRO packet. As now rq->hw_mtu always needs to be updated during the MTU change flow, drop the reset avoidance optimization from mlx5e_change_mtu(). Extract the CQE LRO segments reading into a helper function as it is used twice now. [1] Documentation/networking/netdevices.rst#L205 Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260203072130.1710255-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: remove support for lpi_intr_oRussell King (Oracle)
The dwmac databook for v3.74a states that lpi_intr_o is a sideband signal which should be used to ungate the application clock, and this signal is synchronous to the receive clock. The receive clock can run at 2.5, 25 or 125MHz depending on the media speed, and can stop under the control of the link partner. This means that the time it takes to clear is dependent on the negotiated media speed, and thus can be 8, 40, or 400ns after reading the LPI control and status register. It has been observed with some aggressive link partners, this clock can stop while lpi_intr_o is still asserted, meaning that the signal remains asserted for an indefinite period that the local system has no direct control over. The LPI interrupts will still be signalled through the main interrupt path in any case, and this path is not dependent on the receive clock. This, since we do not gate the application clock, and the chances of adding clock gating in the future are slim due to the clocks being ill-defined, lpi_intr_o serves no useful purpose. Remove the code which requests the interrupt, and all associated code. Reported-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Tested-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> # Renesas RZ/V2H board Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnJbt-00000007YYN-28nm@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge branch 'net-stmmac-fix-serdes-power-methods'Jakub Kicinski
Russell King says: ==================== net: stmmac: fix serdes power methods The stmmac serdes powerup/powerdown methods are not guaranteed to be called in a balancing fashion, but these are used to call the generic PHY subsystem's phy_power_up() and phy_power_down() methods which do require balanced calls. This series addresses this by making the stmmac serdes methods balanced. ==================== Link: https://patch.msgid.link/aYHHWm5UkD1JVa7D@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: move serdes power methods to stmmac_[open|release]()Russell King (Oracle)
Move the SerDes power up and down calls for the non-"after linkup" case out of __stmmac_open() and __stmmac_release() into the stmmac_open() and stmmac_release() methods, which means the SerDes will only change power state on administrative changes or suspend/ resume, not while changing the interface MTU. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnDDt-00000007XxF-3uUK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: add missing serdes power down in error pathsRussell King (Oracle)
The open path is missing cleanup of a successful serdes power up if stmmac_hw_setup() or stmmac_request_irq() fails. stmmac_resume() is also missing cleanup of the serdes power up if stmmac_hw_setup() fails. Add the missing cleanups. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnDDo-00000007Xx9-3RZ8@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: add state tracking for legacy serdes power stateRussell King (Oracle)
Avoid calling the serdes_powerdown() method if we have not had a preceeding successful call to the serdes_powerup() method. This avoids unbalancing refcounted resources that may be used in the these platform glue serdes methods. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnDDj-00000007Xx3-2xZ0@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: add wrappers for serdes_power[up|down]() methodsRussell King (Oracle)
Add wrappers for the serdes_power[up|down]() methods and update all call sites. This will allow us to add state tracking. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vnDDe-00000007Xww-2VUU@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge branch 'net-rds-rds-tcp-protocol-and-extension-improvements'Jakub Kicinski
Allison Henderson says: ==================== net/rds: RDS-TCP protocol and extension improvements This is subset 3 of the larger RDS-TCP patch series I posted last Oct. The greater series aims to correct multiple rds-tcp issues that can cause dropped or out of sequence messages. I've broken it down into smaller sets to make reviews more manageable. In this set, we introduce extension headers for byte accounting and fix several RDS/TCP protocol issues including message preservation during connection transitions and multipath lane handling. The entire set can be viewed in the rfc here: https://lore.kernel.org/netdev/20251022191715.157755-1-achender@kernel.org/ ==================== Link: https://patch.msgid.link/20260203055723.1085751-1-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: Trigger rds_send_ping() more than onceGerd Rausch
Even though a peer may have already received a non-zero value for "RDS_EXTHDR_NPATHS" from a node in the past, the current peer may not. Therefore it is important to initiate another rds_send_ping() after a re-connect to any peer: It is unknown at that time if we're still talking to the same instance of RDS kernel modules on the other side. Otherwise, the peer may just operate on a single lane ("c_npaths == 0"), not knowing that more lanes are available. However, if "c_with_sport_idx" is supported, we also need to check that the connection we accepted on lane#0 meets the proper source port modulo requirement, as we fan out: Since the exchange of "RDS_EXTHDR_NPATHS" and "RDS_EXTHDR_SPORT_IDX" is asynchronous, initially we have no choice but to accept an incoming connection (via "accept") in the first slot ("cp_index == 0") for backwards compatibility. But that very connection may have come from a different lane with "cp_index != 0", since the peer thought that we already understood and handled "c_with_sport_idx" properly, as indicated by a previous exchange before a module was reloaded. In short: If a module gets reloaded, we recover from that, but do *not* allow a downgrade to support fewer lanes. Downgrades would require us to merge messages from separate lanes, which is rather tricky with the current RDS design. Each lane has its own sequence number space and all messages would need to be re-sequenced as we merge, all while handling "RDS_FLAG_RETRANSMITTED" and "cp_retrans" properly. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-9-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: Use the first lane until RDS_EXTHDR_NPATHS arrivesGerd Rausch
Instead of just blocking the sender until "c_npaths" is known (it gets updated upon the receipt of a MPRDS PONG message), simply use the first lane (cp_index#0). But just using the first lane isn't enough. As soon as we enqueue messages on a different lane, we'd run the risk of out-of-order delivery of RDS messages. Earlier messages enqueued on "cp_index == 0" could be delivered later than more recent messages enqueued on "cp_index > 0", mostly because of possible head of line blocking issues causing the first lane to be slower. To avoid that, we simply take a snapshot of "cp_next_tx_seq" at the time we're about to fan-out to more lanes. Then we delay the transmission of messages enqueued on other lanes with "cp_index > 0" until cp_index#0 caught up with the delivery of new messages (from "cp_send_queue") as well as in-flight messages (from "cp_retrans") that haven't been acknowledged yet by the receiver. We also add a new counter "mprds_catchup_tx0_retries" to keep track of how many times "rds_send_xmit" had to suspend activities, because it was waiting for the first lane to catch up. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-8-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: Update struct rds_statistics to use u64 instead of uint64_tAllison Henderson
Quick clean up to avoid checkpatch errors when adding members to this struct (Prefer kernel type 'u64' over 'uint64_t'). No functional changes added. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-7-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: Clear reconnect pending bitHåkon Bugge
When canceling the reconnect worker, care must be taken to reset the reconnect-pending bit. If the reconnect worker has not yet been scheduled before it is canceled, the reconnect-pending bit will stay on forever. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-6-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: Kick-start TCP receiver after acceptGerd Rausch
In cases where the server (the node with the higher IP-address) in an RDS/TCP connection is overwhelmed it is possible that the socket that was just accepted is chock-full of messages, up to the limit of what the socket receive buffer permits. Subsequently, "rds_tcp_data_ready" won't be called anymore, because there is no more space to receive additional messages. Nor was it called prior to the point of calling "rds_tcp_set_callbacks", because the "sk_data_ready" pointer didn't even point to "rds_tcp_data_ready" yet. We fix this by simply kick-starting the receive-worker for all cases where the socket state is neither "TCP_CLOSE_WAIT" nor "TCP_CLOSE". Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-5-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: rds_tcp_conn_path_shutdown must not discard messagesGerd Rausch
RDS/TCP differs from RDS/RDMA in that message acknowledgment is done based on TCP sequence numbers: As soon as the last byte of a message has been acknowledged by the TCP stack of a peer, rds_tcp_write_space() goes on to discard prior messages from the send queue. Which is fine, for as long as the receiver never throws any messages away. The dequeuing of messages in RDS/TCP is done either from the "sk_data_ready" callback pointing to rds_tcp_data_ready() (the most common case), or from the receive worker pointing to rds_tcp_recv_path() which is called for as long as the connection is "RDS_CONN_UP". However, as soon as rds_conn_path_drop() is called for whatever reason, including "DR_USER_RESET", "cp_state" transitions to "RDS_CONN_ERROR", and rds_tcp_restore_callbacks() ends up restoring the callbacks and thereby disabling message receipt. So messages already acknowledged to the sender were dropped. Furthermore, the "->shutdown" callback was always called with an invalid parameter ("RCV_SHUTDOWN | SEND_SHUTDOWN == 3"), instead of the correct pre-increment value ("SHUT_RDWR == 2"). inet_shutdown() returns "-EINVAL" in such cases, rendering this call a NOOP. So we change rds_tcp_conn_path_shutdown() to do the proper "->shutdown(SHUT_WR)" call in order to signal EOF to the peer and make it transition to "TCP_CLOSE_WAIT" (RFC 793). This should make the peer also enter rds_tcp_conn_path_shutdown() and do the same. This allows us to dequeue all messages already received and acknowledged to the peer. We do so, until we know that the receive queue no longer has data (skb_queue_empty()) and that we couldn't have any data in flight anymore, because the socket transitioned to any of the states "CLOSING", "TIME_WAIT", "CLOSE_WAIT", "LAST_ACK", or "CLOSE" (RFC 793). However, if we do just that, we suddenly see duplicate RDS messages being delivered to the application. So what gives? Turns out that with MPRDS and its multitude of backend connections, retransmitted messages ("RDS_FLAG_RETRANSMITTED") can outrace the dequeuing of their original counterparts. And the duplicate check implemented in rds_recv_local() only discards duplicates if flag "RDS_FLAG_RETRANSMITTED" is set. Rather curious, because a duplicate is a duplicate; it shouldn't matter which copy is looked at and delivered first. To avoid this entire situation, we simply make the sender discard messages from the send-queue right from within rds_tcp_conn_path_shutdown(). Just like rds_tcp_write_space() would have done, were it called in time or still called. This makes sure that we no longer have messages that we know the receiver already dequeued sitting in our send-queue, and therefore avoid the entire "RDS_FLAG_RETRANSMITTED" fiasco. Now we got rid of the duplicate RDS message delivery, but we still run into cases where RDS messages are dropped. This time it is due to the delayed setting of the socket-callbacks in rds_tcp_accept_one() via either rds_tcp_reset_callbacks() or rds_tcp_set_callbacks(). By the time rds_tcp_accept_one() gets there, the socket may already have transitioned into state "TCP_CLOSE_WAIT", but rds_tcp_state_change() was never called. Subsequently, "->shutdown(SHUT_WR)" did not happen either. So the peer ends up getting stuck in state "TCP_FIN_WAIT2". We fix that by checking for states "TCP_CLOSE_WAIT", "TCP_LAST_ACK", or "TCP_CLOSE" and drop the freshly accepted socket in that case. This problem is observable by running "rds-stress --reset" frequently on either of the two sides of a RDS connection, or both while other "rds-stress" processes are exchanging data. Those "rds-stress" processes reported out-of-sequence errors, with the expected sequence number being smaller than the one actually received (due to the dropped messages). Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-4-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: Encode cp_index in TCP source portGerd Rausch
Upon "sendmsg", RDS/TCP selects a backend connection based on a hash calculated from the source-port ("RDS_MPATH_HASH"). However, "rds_tcp_accept_one" accepts connections in the order they arrive, which is non-deterministic. Therefore the mapping of the sender's "cp->cp_index" to that of the receiver changes if the backend connections are dropped and reconnected. However, connection state that's preserved across reconnects (e.g. "cp_next_rx_seq") relies on that sender<->receiver mapping to never change. So we make sure that client and server of the TCP connection have the exact same "cp->cp_index" across reconnects by encoding "cp->cp_index" in the lower three bits of the client's TCP source port. A new extension "RDS_EXTHDR_SPORT_IDX" is introduced, that allows the server to tell the difference between clients that do the "cp->cp_index" encoding, and legacy clients that pick source ports randomly. Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-3-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/rds: new extension header: rdma bytesShamir Rabinovitch
Introduce a new extension header type RDSV3_EXTHDR_RDMA_BYTES for an RDMA initiator to exchange rdma byte counts to its target. Currently, RDMA operations cannot precisely account how many bytes a peer just transferred via RDMA, which limits per-connection statistics and future policy (e.g., monitoring or rate/cgroup accounting of RDMA traffic). In this patch we expand rds_message_add_extension to accept multiple extensions, and add new flag to RDS header: RDS_FLAG_EXTHDR_EXTENSION, along with a new extension to RDS header: rds_ext_header_rdma_bytes. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20260203055723.1085751-2-achender@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet_sched: sch_fq: tweak unlikely() hints in fq_dequeue()Eric Dumazet
After 076433bd78d7 ("net_sched: sch_fq: add fast path for mostly idle qdisc") we need to remove one unlikely() because q->internal holds all the fast path packets. skb = fq_peek(&q->internal); if (unlikely(skb)) { q->internal.qlen--; Calling INET_ECN_set_ce() is very unlikely. These changes allow fq_dequeue_skb() to be (auto)inlined, thus making fq_dequeue() faster. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 2/2 grow/shrink: 0/1 up/down: 283/-269 (14) Function old new delta INET_ECN_set_ce - 267 +267 __pfx_INET_ECN_set_ce - 16 +16 __pfx_fq_dequeue_skb 16 - -16 fq_dequeue_skb 103 - -103 fq_dequeue 1685 1535 -150 Total: Before=24886569, After=24886583, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260203214716.880853-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysovpn: Replace use of system_wq with system_percpu_wqMarco Crivellari
This patch continues the effort to refactor workqueue APIs, which has begun with the changes introducing new workqueues and a new alloc_workqueue flag: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") The point of the refactoring is to eventually alter the default behavior of workqueues to become unbound by default so that their workload placement is optimized by the scheduler. Before that to happen after a careful review and conversion of each individual case, workqueue users must be converted to the better named new workqueues with no intended behaviour changes: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Acked-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20251224155006.114824-1-marco.crivellari@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/iucv: clean up iucv kernel-doc warningsRandy Dunlap
Fix numerous (many) kernel-doc warnings in iucv.[ch]: - convert function documentation comments to a common (kernel-doc) look, even for static functions (without "/**") - use matching parameter and parameter description names - use better wording in function descriptions (Jakub & AI) - remove duplicate kernel-doc comments from the header file (Jakub) Examples: Warning: include/net/iucv/iucv.h:210 missing initial short description on line: * iucv_unregister Warning: include/net/iucv/iucv.h:216 function parameter 'handle' not described in 'iucv_unregister' Warning: include/net/iucv/iucv.h:467 function parameter 'answer' not described in 'iucv_message_send2way' Warning: net/iucv/iucv.c:727 missing initial short description on line: * iucv_cleanup_queue Build-tested with both "make htmldocs" and "make ARCH=s390 defconfig all". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20260203075248.1177869-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: Fix typo from clk_scr_i to clk_csr_iHuacai Chen
In include/linux/stmmac.h clk_csr_i is spelled as clk_scr_i by mistake, so correct the typo. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn> Link: https://patch.msgid.link/20260203062658.2156653-1-chenhuacai@loongson.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daystcp: split tcp_check_space() in two partsEric Dumazet
tcp_check_space() is fat and not inlined. Move its slow path in (out of line) __tcp_check_space() and make tcp_check_space() an inline function for better TCP performance. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/2 grow/shrink: 4/0 up/down: 708/-582 (126) Function old new delta __tcp_check_space - 521 +521 tcp_rcv_established 1860 1916 +56 tcp_rcv_state_process 3342 3384 +42 tcp_event_new_data_sent 248 286 +38 tcp_data_snd_check 71 106 +35 __pfx___tcp_check_space - 16 +16 __pfx_tcp_check_space 16 - -16 tcp_check_space 566 - -566 Total: Before=24896373, After=24896499, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260203050932.3522221-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daystcp: move tcp_rbtree_insert() to tcp_output.cEric Dumazet
tcp_rbtree_insert() is primarily used from tcp_output.c In tcp_input.c, only (slow path) tcp_collapse() uses it. Move it to tcp_output.c to allow its (auto)inlining to improve TCP tx fast path. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 4/1 up/down: 445/-115 (330) Function old new delta tcp_connect 4277 4478 +201 tcp_event_new_data_sent 162 248 +86 tcp_send_synack 780 862 +82 tcp_fragment 1185 1261 +76 tcp_collapse 1524 1409 -115 Total: Before=24896043, After=24896373, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260203045110.3499713-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daystcp: use __skb_push() in __tcp_transmit_skb()Eric Dumazet
We trust MAX_TCP_HEADER to be large enough. Using the inlined version of skb_push() trades 8 bytes of text for better performance of TCP TX fast path. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 1/0 up/down: 8/0 (8) Function old new delta __tcp_transmit_skb 3181 3189 +8 Total: Before=24896035, After=24896043, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260203044226.3489941-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'wireless-next-2026-02-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Some more changes, including pulls from drivers: - ath drivers: small features/cleanups - rtw drivers: mostly refactoring for rtw89 RTL8922DE support - mac80211: use hrtimers for CAC to avoid too long delays - cfg80211/mac80211: some initial UHR (Wi-Fi 8) support * tag 'wireless-next-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (59 commits) wifi: brcmsmac: phy: Remove unreachable error handling code wifi: mac80211: Add eMLSR/eMLMR action frame parsing support wifi: mac80211: add initial UHR support wifi: cfg80211: add initial UHR support wifi: ieee80211: add some initial UHR definitions wifi: mac80211: use wiphy_hrtimer_work for CAC timeout wifi: mac80211: correct ieee80211-{s1g/eht}.h include guard comments wifi: ath12k: clear stale link mapping of ahvif->links_map wifi: ath12k: Add support TX hardware queue stats wifi: ath12k: Add support RX PDEV stats wifi: ath12k: Fix index decrement when array_len is zero wifi: ath12k: support OBSS PD configuration for AP mode wifi: ath12k: add WMI support for spatial reuse parameter configuration dt-bindings: net: wireless: ath11k-pci: deprecate 'firmware-name' property wifi: ath11k: add usecase firmware handling based on device compatible wifi: ath10k: sdio: add missing lock protection in ath10k_sdio_fw_crashed_dump() wifi: ath10k: fix lock protection in ath10k_wmi_event_peer_sta_ps_state_chg() wifi: ath10k: snoc: support powering on the device via pwrseq wifi: rtw89: pci: warn if SPS OCP happens for RTL8922DE wifi: rtw89: pci: restore LDO setting after device resume ... ==================== Link: https://patch.msgid.link/20260204121143.181112-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'wireless-2026-02-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Two last-minute iwlwifi fixes: - cancel mlo_scan_work on disassoc to avoid use-after-free/init-after-queue issues - pause TCM work on suspend to avoid crashing the FW (and sometimes the host) on resume with traffic * tag 'wireless-2026-02-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: mvm: pause TCM on fast resume wifi: iwlwifi: mld: cancel mlo_scan_start_wk ==================== Link: https://patch.msgid.link/20260204113547.159742-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysselftests: ublk: organize test directories by test IDMing Lei
Set UBLK_TEST_DIR to ${TMPDIR:-./ublktest-dir}/${TID}.XXXXXX to create per-test subdirectories organized by test ID. This makes it easier to identify and debug specific test runs. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 daysblock: decouple secure erase size limit from discard size limitLuke Wang
Secure erase should use max_secure_erase_sectors instead of being limited by max_discard_sectors. Separate the handling of REQ_OP_SECURE_ERASE from REQ_OP_DISCARD to allow each operation to use its own size limit. Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 daysMerge branch 'mptcp-misc-features-for-v6-20-7-0'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: misc. features for v6.20/7.0 This series contains a few independent new features, and small fixes for net-next: - Patches 1-2: two small fixes linked to the MPTCP receive buffer that are not urgent, requiring code that has been recently changed, and is needed for the next patch. Because we are at the end of the cycle, it seems easier to send them to net-next, instead of dealing with conflicts between net and net-next. - Patch 3: a refactoring to simplify the code around MPTCP DRS. - Patch 4: a new trace event for MPTCP to help debugging receive buffer auto-tuning issues. - Patch 5: align internal MPTCP PM structure with NL specs, just to manipulate the same thing. - Patch 6: convert some min_t(int, ...) to min(): cleaner, and to avoid future warnings. - Patch 7: [removed] - Patch 8: sort all #include in MPTCP Diag tool in the selftests to prevent future potential conflicts and ease the reading. - Patches 9-11: improve the MPTCP Join selftest by waiting for an event instead of a "random" sleep. - Patches 12-14: some small cleanups in the selftests, seen while working on the previous patches. - Patch 15: avoid marking subtests as skipped while still validating most checks when executing the last MPTCP selftests on older kernels. ==================== Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-0-31ec8bfc56d1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysselftests: mptcp: join: no SKIP mark for group checksMatthieu Baerts (NGI0)
When executing the last MPTCP selftests on older kernels, this output is printed: # 001 no JOIN # join Rx [SKIP] # join Tx [SKIP] # fallback [SKIP] In fact, behind each line, a few counters are checked, and likely not all of them have been skipped because the they are not available on these kernels. Instead, "new" and unsupported counters for these groups are now ignored, and [ OK ] will be printed instead of [SKIP]. Note that on the MPTCP CI, when validating the dev versions, any unsupported counter will cause the tests to fail. So this is safe not to print 'SKIP' for these group checks. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-15-31ec8bfc56d1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysselftests: mptcp: connect cleanup TFO setupMatthieu Baerts (NGI0)
To the TFO, only the file descriptor is needed, the family is not. Also, the error can be handled the same way when 'sendto()' or 'connect()' are used. Only the printed error message is different. This avoids a bit of confusions. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-14-31ec8bfc56d1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>