summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-10-13net: Handle bulk delete policy in bridge driverAmit Cohen
The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added support for FDB flushing in bridge driver. The following patches will extend VXLAN driver to support FDB flushing as well. The netlink message for bulk delete is shared between the drivers. With the existing implementation, there is no way to prevent user from flushing with attributes that are not supported per driver. For example, when VNI will be added, user will not get an error for flush FDB entries in bridge with VNI, although this attribute is not relevant for bridge. As preparation for support of FDB flush in VXLAN driver, move the policy to be handled in bridge driver, later a new policy for VXLAN will be added in VXLAN driver. Do not pass 'vid' as part of ndo_fdb_del_bulk(), as this field is relevant only for bridge. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: kernel/bpf/verifier.c 829955981c55 ("bpf: Fix verifier log for async callback return values") a923819fb2c5 ("bpf: Treat first argument as return value for bpf_throw") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-12Merge tag 'net-6.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN and BPF. We have a regression in TC currently under investigation, otherwise the things that stand off most are probably the TCP and AF_PACKET fixes, with both issues coming from 6.5. Previous releases - regressions: - af_packet: fix fortified memcpy() without flex array. - tcp: fix crashes trying to free half-baked MTU probes - xdp: fix zero-size allocation warning in xskq_create() - can: sja1000: always restart the tx queue after an overrun - eth: mlx5e: again mutually exclude RX-FCS and RX-port-timestamp - eth: nfp: avoid rmmod nfp crash issues - eth: octeontx2-pf: fix page pool frag allocation warning Previous releases - always broken: - mctp: perform route lookups under a RCU read-side lock - bpf: s390: fix clobbering the caller's backchain in the trampoline - phy: lynx-28g: cancel the CDR check work item on the remove path - dsa: qca8k: fix qca8k driver for Turris 1.x - eth: ravb: fix use-after-free issue in ravb_tx_timeout_work() - eth: ixgbe: fix crash with empty VF macvlan list" * tag 'net-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) rswitch: Fix imbalance phy_power_off() calling rswitch: Fix renesas_eth_sw_remove() implementation octeontx2-pf: Fix page pool frag allocation warning nfc: nci: assert requested protocol is valid af_packet: Fix fortified memcpy() without flex array. net: tcp: fix crashes trying to free half-baked MTU probes net/smc: Fix pos miscalculation in statistics nfp: flower: avoid rmmod nfp crash issues net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read ethtool: Fix mod state of verbose no_mask bitset net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn() mctp: perform route lookups under a RCU read-side lock net: skbuff: fix kernel-doc typos s390/bpf: Fix unwinding past the trampoline s390/bpf: Fix clobbering the caller's backchain in the trampoline net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestamp net/smc: Fix dependency of SMC on ISM ixgbe: fix crash with empty VF macvlan list net/mlx5e: macsec: use update_pn flag instead of PN comparation net: phy: mscc: macsec: reject PN update requests ...
2023-10-11netdev: replace napi_reschedule with napi_scheduleChristian Marangi
Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit bfe13f54f502 ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11netdev: make napi_schedule return bool on NAPI successful scheduleChristian Marangi
Change napi_schedule to return a bool on NAPI successful schedule. This might be useful for some driver to do additional steps after a NAPI has been scheduled. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231009133754.9834-2-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11Merge tag 'fs_for_v6.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota regression fix from Jan Kara. * tag 'fs_for_v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Fix slow quotaoff
2023-10-11net/core: Introduce netdev_core_stats_inc()Yajun Deng
Although there is a kfree_skb_reason() helper function that can be used to find the reason why this skb is dropped, but most callers didn't increase one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. For the users, people are more concerned about why the dropped in ip is increasing. Introduce netdev_core_stats_inc() for trace the caller of dev_core_stats_*_inc(). Also, add __code to netdev_core_stats_alloc(), as it's called with small probability. And add noinline make sure netdev_core_stats_inc was never inlined. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-10net: skbuff: fix kernel-doc typosRandy Dunlap
Correct punctuation and drop an extraneous word. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231008214121.25940-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-06Merge tag 'wireless-next-2023-10-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The first pull request for v6.7, with both stack and driver changes. We have a big change how locking is handled in cfg80211 and mac80211 which removes several locks and hopefully simplifies the locking overall. In drivers rtw89 got MCC support and smaller features to other active drivers but nothing out of ordinary. Major changes: cfg80211 - remove wdev mutex, use the wiphy mutex instead - annotate iftype_data pointer with sparse - first kunit tests, for element defrag - remove unused scan_width support mac80211 - major locking rework, remove several locks like sta_mtx, key_mtx etc. and use the wiphy mutex instead - remove unused shifted rate support - support antenna control in frame injection (requires driver support) - convert RX_DROP_UNUSABLE to more detailed reason codes rtw89 - TDMA-based multi-channel concurrency (MCC) support iwlwifi - support set_antenna() operation - support frame injection antenna control ath12k - WCN7850: enable 320 MHz channels in 6 GHz band - WCN7850: hardware rfkill support - WCN7850: enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster ath11k - add chip id board name while searching board-2.bin * tag 'wireless-next-2023-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (272 commits) wifi: rtlwifi: remove unreachable code in rtl92d_dm_check_edca_turbo() wifi: rtw89: debug: txpwr table supports Wi-Fi 7 chips wifi: rtw89: debug: show txpwr table according to chip gen wifi: rtw89: phy: set TX power RU limit according to chip gen wifi: rtw89: phy: set TX power limit according to chip gen wifi: rtw89: phy: set TX power offset according to chip gen wifi: rtw89: phy: set TX power by rate according to chip gen wifi: rtw89: mac: get TX power control register according to chip gen wifi: rtlwifi: use unsigned long for rtl_bssid_entry timestamp wifi: rtlwifi: fix EDCA limit set by BT coexistence wifi: rt2x00: fix MT7620 low RSSI issue wifi: rtw89: refine bandwidth 160MHz uplink OFDMA performance wifi: rtw89: refine uplink trigger based control mechanism wifi: rtw89: 8851b: update TX power tables to R34 wifi: rtw89: 8852b: update TX power tables to R35 wifi: rtw89: 8852c: update TX power tables to R67 wifi: rtw89: regd: configure Thailand in regulation type wifi: mac80211: add back SPDX identifier wifi: mac80211: fix ieee80211_drop_unencrypted_mgmt return type/value wifi: rtlwifi: cleanup few rtlxxxx_set_hw_reg() routines ... ==================== Link: https://lore.kernel.org/r/87jzrz6bvw.fsf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-06net: phy: broadcom: add support for BCM5221 phyGiulio Benetti
This patch adds the BCM5221 PHY support by reusing brcm_fet_*() callbacks and adding quirks for BCM5221 when needed. Cc: Jim Reinhart <jimr@tekvox.com> Cc: James Autry <jautry@tekvox.com> Cc: Matthew Maron <matthewm@tekvox.com> Signed-off-by: Giulio Benetti <giulio.benetti+tekvox@benettiengineering.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231005182915.153815-1-giulio.benetti@benettiengineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-06Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== i40e: House-keeping and clean-up Ivan Vecera says: The series makes some house-keeping tasks on i40e driver: Patch 1: Removes unnecessary back pointer from i40e_hw Patch 2: Moves I40E_MASK macro to i40e_register.h where is used Patch 3: Refactors I40E_MDIO_CLAUSE* to use the common macro Patch 4: Add header dependencies to <linux/avf/virtchnl.h> Patch 5: Simplifies memory alloction functions Patch 6: Moves mem alloc structures to i40e_alloc.h Patch 7: Splits i40e_osdep.h to i40e_debug.h and i40e_io.h Patch 8: Removes circular header deps, fixes and cleans headers Patch 9: Moves DDP specific macros and structs to i40e_ddp.c * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: Move DDP specific macros and structures to i40e_ddp.c i40e: Remove circular header dependencies and fix headers i40e: Split i40e_osdep.h i40e: Move memory allocation structures to i40e_alloc.h i40e: Simplify memory allocation functions virtchnl: Add header dependencies i40e: Refactor I40E_MDIO_CLAUSE* macros i40e: Move I40E_MASK macro to i40e_register.h i40e: Remove back pointer from i40e_hw structure ==================== Link: https://lore.kernel.org/r/20231005162850.3218594-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-06Merge tag 'linux-can-next-for-6.7-20231005' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2023-10-05 The first patch is by Miquel Raynal and fixes a comment in the sja1000 driver. Vincent Mailhol contributes 2 patches that fix W=1 compiler warnings in the etas_es58x driver. Jiapeng Chong's patch removes an unneeded NULL pointer check before dev_put() in the CAN raw protocol. A patch by Justin Stittreplaces a strncpy() by strscpy() in the peak_pci sja1000 driver. The next 5 patches are by me and fix the can_restart() handler and replace BUG_ON()s in the CAN dev helpers with proper error handling. The last 27 patches are also by me and target the at91_can driver. First a new helper function is introduced, the at91_can driver is cleaned up and updated to use the rx-offload helper. * tag 'linux-can-next-for-6.7-20231005' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (37 commits) can: at91_can: switch to rx-offload implementation can: at91_can: at91_alloc_can_err_skb() introduce new function can: at91_can: at91_irq_err_line(): send error counters with state change can: at91_can: at91_irq_err_line(): make use of can_change_state() and can_bus_off() can: at91_can: at91_irq_err_line(): take reg_sr into account for bus off can: at91_can: at91_irq_err_line(): make use of can_state_get_by_berr_counter() can: at91_can: at91_irq_err(): rename to at91_irq_err_line() can: at91_can: at91_irq_err_frame(): move next to at91_irq_err() can: at91_can: at91_irq_err_frame(): call directly from IRQ handler can: at91_can: at91_poll_err(): increase stats even if no quota left or OOM can: at91_can: at91_poll_err(): fold in at91_poll_err_frame() can: at91_can: add CAN transceiver support can: at91_can: at91_open(): forward request_irq()'s return value in case or an error can: at91_can: at91_chip_start(): don't disable IRQs twice can: at91_can: at91_set_bittiming(): demote register output to debug level can: at91_can: rename struct at91_priv::{tx_next,tx_echo} to {tx_head,tx_tail} can: at91_can: at91_setup_mailboxes(): update comments can: at91_can: add more register definitions can: at91_can: MCR Register: convert to FIELD_PREP() can: at91_can: MSR Register: convert to FIELD_PREP() ... ==================== Link: https://lore.kernel.org/r/20231005195812.549776-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-06Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A typo fix for a PMU driver, a workround for a side-channel erratum on Cortex-A520 and a fix for the local timer save/restore when using ACPI with Qualcomm's custom CPUs: - Workaround for Cortex-A520 erratum #2966298 - Fix typo in Arm CMN PMU driver that breaks counter overflow handling - Fix timer handling across idle for Qualcomm custom CPUs" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: cpuidle, ACPI: Evaluate LPI arch_flags for broadcast timer arm64: errata: Add Cortex-A520 speculative unprivileged load workaround arm64: Add Cortex-A520 CPU part definition perf/arm-cmn: Fix the unhandled overflow status of counter 4 to 7
2023-10-06Merge wireless into wireless-nextJohannes Berg
Resolve several conflicts, mostly between changes/fixes in wireless and the locking rework in wireless-next. One of the conflicts actually shows a bug in wireless that we'll want to fix separately. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Kalle Valo <kvalo@kernel.org>
2023-10-06quota: Fix slow quotaoffJan Kara
Eric has reported that commit dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") heavily increases runtime of generic/270 xfstest for ext4 in nojournal mode. The reason for this is that ext4 in nojournal mode leaves dquots dirty until the last dqput() and thus the cleanup done in quota_release_workfn() has to write them all. Due to the way quota_release_workfn() is written this results in synchronize_srcu() call for each dirty dquot which makes the dquot cleanup when turning quotas off extremely slow. To be able to avoid synchronize_srcu() for each dirty dquot we need to rework how we track dquots to be cleaned up. Instead of keeping the last dquot reference while it is on releasing_dquots list, we drop it right away and mark the dquot with new DQ_RELEASING_B bit instead. This way we can we can remove dquot from releasing_dquots list when new reference to it is acquired and thus there's no need to call synchronize_srcu() each time we drop dq_list_lock. References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64 Reported-by: Eric Whitney <enwlinux@gmail.com> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2023-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-05can: dev: add can_state_get_by_berr_counter() to return the CAN state based ↵Marc Kleine-Budde
on the current error counters Some CAN controllers do not have a register that contains the current CAN state, but only a register that contains the error counters. Introduce a new function can_state_get_by_berr_counter() that returns the current TX and RX state depending on the provided CAN bit error counters. Link: https://lore.kernel.org/all/20231005-at91_can-rx_offload-v2-1-9987d53600e0@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-10-05Merge tag 'net-6.6-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, netfilter, BPF and WiFi. I didn't collect precise data but feels like we've got a lot of 6.5 fixes here. WiFi fixes are most user-awaited. Current release - regressions: - Bluetooth: fix hci_link_tx_to RCU lock usage Current release - new code bugs: - bpf: mprog: fix maximum program check on mprog attachment - eth: ti: icssg-prueth: fix signedness bug in prueth_init_tx_chns() Previous releases - regressions: - ipv6: tcp: add a missing nf_reset_ct() in 3WHS handling - vringh: don't use vringh_kiov_advance() in vringh_iov_xfer(), it doesn't handle zero length like we expected - wifi: - cfg80211: fix cqm_config access race, fix crashes with brcmfmac - iwlwifi: mvm: handle PS changes in vif_cfg_changed - mac80211: fix mesh id corruption on 32 bit systems - mt76: mt76x02: fix MT76x0 external LNA gain handling - Bluetooth: fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER - l2tp: fix handling of transhdrlen in __ip{,6}_append_data() - dsa: mv88e6xxx: avoid EEPROM timeout when EEPROM is absent - eth: stmmac: fix the incorrect parameter after refactoring Previous releases - always broken: - net: replace calls to sock->ops->connect() with kernel_connect(), prevent address rewrite in kernel_bind(); otherwise BPF hooks may modify arguments, unexpectedly to the caller - tcp: fix delayed ACKs when reads and writes align with MSS - bpf: - verifier: unconditionally reset backtrack_state masks on global func exit - s390: let arch_prepare_bpf_trampoline return program size, fix struct_ops offsets - sockmap: fix accounting of available bytes in presence of PEEKs - sockmap: reject sk_msg egress redirects to non-TCP sockets - ipv4/fib: send netlink notify when delete source address routes - ethtool: plca: fix width of reads when parsing netlink commands - netfilter: nft_payload: rebuild vlan header on h_proto access - Bluetooth: hci_codec: fix leaking memory of local_codecs - eth: intel: ice: always add legacy 32byte RXDID in supported_rxdids - eth: stmmac: - dwmac-stm32: fix resume on STM32 MCU - remove buggy and unneeded stmmac_poll_controller, depend on NAPI - ibmveth: always recompute TCP pseudo-header checksum, fix use of the driver with Open vSwitch - wifi: - rtw88: rtw8723d: fix MAC address offset in EEPROM - mt76: fix lock dependency problem for wed_lock - mwifiex: sanity check data reported by the device - iwlwifi: ensure ack flag is properly cleared - iwlwifi: mvm: fix a memory corruption due to bad pointer arithm - iwlwifi: mvm: fix incorrect usage of scan API Misc: - wifi: mac80211: work around Cisco AP 9115 VHT MPDU length" * tag 'net-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) MAINTAINERS: update Matthieu's email address mptcp: userspace pm allow creating id 0 subflow mptcp: fix delegated action races net: stmmac: remove unneeded stmmac_poll_controller net: lan743x: also select PHYLIB net: ethernet: mediatek: disable irq before schedule napi net: mana: Fix oversized sge0 for GSO packets net: mana: Fix the tso_bytes calculation net: mana: Fix TX CQE error handling netlink: annotate data-races around sk->sk_err sctp: update hb timer immediately after users change hb_interval sctp: update transport state when processing a dupcook packet tcp: fix delayed ACKs for MSS boundary condition tcp: fix quick-ack counting to count actual ACKs of new data page_pool: fix documentation typos tipc: fix a potential deadlock on &tx->lock net: stmmac: dwmac-stm32: fix resume on STM32 MCU ipv4: Set offload_failed flag in fibmatch results netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs ...
2023-10-05virtchnl: Add header dependenciesIvan Vecera
The <linux/avf/virtchnl.h> uses BIT, struct_size and ETH_ALEN macros but does not include appropriate header files that defines them. Add these dependencies so this header file can be included anywhere. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-10-04Merge tag 'nf-23-10-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter patches for net First patch resolves a regression with vlan header matching, this was broken since 6.5 release. From myself. Second patch fixes an ancient problem with sctp connection tracking in case INIT_ACK packets are delayed. This comes with a selftest, both patches from Xin Long. Patch 4 extends the existing nftables audit selftest, from Phil Sutter. Patch 5, also from Phil, avoids a situation where nftables would emit an audit record twice. This was broken since 5.13 days. Patch 6, from myself, avoids spurious insertion failure if we encounter an overlapping but expired range during element insertion with the 'nft_set_rbtree' backend. This problem exists since 6.2. * tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs selftests: netfilter: Extend nft_audit.sh selftests: netfilter: test for sctp collision processing in nf_conntrack netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp netfilter: nft_payload: rebuild vlan header on h_proto access ==================== Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04cpuidle, ACPI: Evaluate LPI arch_flags for broadcast timerOza Pawandeep
Arm® Functional Fixed Hardware Specification defines LPI states, which provide an architectural context loss flags field that can be used to describe the context that might be lost when an LPI state is entered. - Core context Lost - General purpose registers. - Floating point and SIMD registers. - System registers, include the System register based - generic timer for the core. - Debug register in the core power domain. - PMU registers in the core power domain. - Trace register in the core power domain. - Trace context loss - GICR - GICD Qualcomm's custom CPUs preserves the architectural state, including keeping the power domain for local timers active. when core is power gated, the local timers are sufficient to wake the core up without needing broadcast timer. The patch fixes the evaluation of cpuidle arch_flags, and moves only to broadcast timer if core context lost is defined in ACPI LPI. Fixes: a36a7fecfe60 ("ACPI / processor_idle: Add support for Low Power Idle(LPI) states") Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Oza Pawandeep <quic_poza@quicinc.com> Link: https://lore.kernel.org/r/20231003173333.2865323-1-quic_poza@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2023-10-04Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-02 We've added 11 non-merge commits during the last 12 day(s) which contain a total of 12 files changed, 176 insertions(+), 41 deletions(-). The main changes are: 1) Fix BPF verifier to reset backtrack_state masks on global function exit as otherwise subsequent precision tracking would reuse them, from Andrii Nakryiko. 2) Several sockmap fixes for available bytes accounting, from John Fastabend. 3) Reject sk_msg egress redirects to non-TCP sockets given this is only supported for TCP sockets today, from Jakub Sitnicki. 4) Fix a syzkaller splat in bpf_mprog when hitting maximum program limits with BPF_F_BEFORE directive, from Daniel Borkmann and Nikolay Aleksandrov. 5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust size_index for selecting a bpf_mem_cache, from Hou Tao. 6) Fix arch_prepare_bpf_trampoline return code for s390 JIT, from Song Liu. 7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off, from Leon Hwang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use kmalloc_size_roundup() to adjust size_index selftest/bpf: Add various selftests for program limits bpf, mprog: Fix maximum program check on mprog attachment bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets bpf, sockmap: Add tests for MSG_F_PEEK bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf: tcp_read_skb needs to pop skb regardless of seq bpf: unconditionally reset backtrack_state masks on global func exit bpf: Fix tr dereferencing selftests/bpf: Check bpf_cubic_acked() is called via struct_ops s390/bpf: Let arch_prepare_bpf_trampoline return program size ==================== Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04netfilter: handle the connecting collision properly in nf_conntrack_proto_sctpXin Long
In Scenario A and B below, as the delayed INIT_ACK always changes the peer vtag, SCTP ct with the incorrect vtag may cause packet loss. Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK 192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151] 192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] * 192.168.1.2 > 192.168.1.1: [COOKIE ECHO] 192.168.1.1 > 192.168.1.2: [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: [COOKIE ACK] Scenario B: INIT_ACK is delayed until the peer completes its own handshake 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] * This patch fixes it as below: In SCTP_CID_INIT processing: - clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]. (Scenario E) - set ct->proto.sctp.init[dir]. In SCTP_CID_INIT_ACK processing: - drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C) - drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A) In SCTP_CID_COOKIE_ACK processing: - clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir]. (Scenario D) Also, it's important to allow the ct state to move forward with cookie_echo and cookie_ack from the opposite dir for the collision scenarios. There are also other Scenarios where it should allow the packet through, addressed by the processing above: Scenario C: new CT is created by INIT_ACK. Scenario D: start INIT on the existing ESTABLISHED ct. Scenario E: start INIT after the old collision on the existing ESTABLISHED ct. 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] (both side are stopped, then start new connection again in hours) 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742] Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-03net: add sysctl to disable rfc4862 5.5.3e lifetime handlingPatrick Rohr
This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid lifetime derivation mechanism. RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router Advertisement PIO shall be ignored if it less than 2 hours and to reset the lifetime of the corresponding address to 2 hours. An in-progress 6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently looking to remove this mechanism. While this draft has not been moving particularly quickly for other reasons, there is widespread consensus on section 4.2 which updates RFC4862 section 5.5.3e. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Jen Linkova <furry@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03overflow: add DEFINE_FLEX() for on-stack allocsPrzemek Kitszel
Add DEFINE_FLEX() macro for on-stack allocations of structs with flexible array member. Expose __struct_size() macro outside of fortify-string.h, as it could be used to read size of structs allocated by DEFINE_FLEX(). Move __member_size() alongside it. -Kees Using underlying array for on-stack storage lets us to declare known-at-compile-time structures without kzalloc(). Actual usage for ice driver is in following patches of the series. Missing __has_builtin() workaround is moved up to serve also assembly compilation with m68k-linux-gcc, see [1]. Error was (note the .S file extension): In file included from ../include/linux/linkage.h:5, from ../arch/m68k/fpsp040/skeleton.S:40: ../include/linux/compiler_types.h:331:5: warning: "__has_builtin" is not defined, evaluates to 0 [-Wundef] 331 | #if __has_builtin(__builtin_dynamic_object_size) | ^~~~~~~~~~~~~ ../include/linux/compiler_types.h:331:18: error: missing binary operator before token "(" 331 | #if __has_builtin(__builtin_dynamic_object_size) | ^ [1] https://lore.kernel.org/netdev/202308112122.OuF0YZqL-lkp@intel.com/ Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230912115937.1645707-2-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03bpf: Remove xdp_do_flush_map().Sebastian Andrzej Siewior
xdp_do_flush_map() can be removed because there is no more user in tree. Remove xdp_do_flush_map(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230908143215.869913-3-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02ipv4/igmp: Annotate struct ip_sf_socklist with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct ip_sf_socklist. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230922172858.3822653-2-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-01Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Fourteen hotfixes, eleven of which are cc:stable. The remainder pertain to issues which were introduced after 6.5" * tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Crash: add lock to serialize crash hotplug handling selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions() mm, memcg: reconsider kmem.limit_in_bytes deprecation mm: zswap: fix potential memory corruption on duplicate store arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries mm: hugetlb: add huge page size param to set_huge_pte_at() maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states maple_tree: add mas_is_active() to detect in-tree walks nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() mm: abstract moving to the next PFN mm: report success more often from filemap_map_folio_range() fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
2023-10-01Merge tag 'timers-urgent-2023-10-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a spurious kernel warning during CPU hotplug events that may trigger when timer/hrtimer softirqs are pending, which are otherwise hotplug-safe and don't merit a warning" * tag 'timers-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Tag (hr)timer softirq as hotplug safe
2023-10-01net: add DEV_STATS_READ() helperEric Dumazet
Companion of DEV_STATS_INC() & DEV_STATS_ADD(). This is going to be used in the series. Use it in macsec_get_stats64(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-30Merge tag 'dma-mapping-6.6-2023-09-30' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix the narea calculation in swiotlb initialization (Ross Lagerwall) - fix the check whether a device has used swiotlb (Petr Tesarik) * tag 'dma-mapping-6.6-2023-09-30' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix the check whether a device has used software IO TLB swiotlb: use the calculated number of areas
2023-09-29mm: hugetlb: add huge page size param to set_huge_pte_at()Ryan Roberts
Patch series "Fix set_huge_pte_at() panic on arm64", v2. This series fixes a bug in arm64's implementation of set_huge_pte_at(), which can result in an unprivileged user causing a kernel panic. The problem was triggered when running the new uffd poison mm selftest for HUGETLB memory. This test (and the uffd poison feature) was merged for v6.5-rc7. Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable (correctly this time) to get it backported to v6.5, where the issue first showed up. Description of Bug ================== arm64's huge pte implementation supports multiple huge page sizes, some of which are implemented in the page table with multiple contiguous entries. So set_huge_pte_at() needs to work out how big the logical pte is, so that it can also work out how many physical ptes (or pmds) need to be written. It previously did this by grabbing the folio out of the pte and querying its size. However, there are cases when the pte being set is actually a swap entry. But this also used to work fine, because for huge ptes, we only ever saw migration entries and hwpoison entries. And both of these types of swap entries have a PFN embedded, so the code would grab that and everything still worked out. But over time, more calls to set_huge_pte_at() have been added that set swap entry types that do not embed a PFN. And this causes the code to go bang. The triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") - added in v6.5-rc7. Although review shows that there are other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger on arm64 because arm64 doesn't support UFFD WP. If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it will dereference a bad pointer in page_folio(): static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) { VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); return page_folio(pfn_to_page(swp_offset_pfn(entry))); } Fix === The simplest fix would have been to revert the dodgy cleanup commit 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since things have moved on, this would have required an audit of all the new set_huge_pte_at() call sites to see if they should be converted to set_huge_swap_pte_at(). As per the original intent of the change, it would also leave us open to future bugs when people invariably get it wrong and call the wrong helper. So instead, I've added a huge page size parameter to set_huge_pte_at(). This means that the arm64 code has the size in all cases. It's a bigger change, due to needing to touch the arches that implement the function, but it is entirely mechanical, so in my view, low risk. I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390, sparc (and additionally x86_64). I've additionally booted and run mm selftests against arm64, where I observe the uffd poison test is fixed, and there are no other regressions. This patch (of 2): In order to fix a bug, arm64 needs to be told the size of the huge page for which the pte is being set in set_huge_pte_at(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. No behavioral changes intended. Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx] Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW statesLiam R. Howlett
When updating the maple tree iterator to avoid rewalks, an issue was introduced when shifting beyond the limits. This can be seen by trying to go to the previous address of 0, which would set the maple node to MAS_NONE and keep the range as the last entry. Subsequent calls to mas_find() would then search upwards from mas->last and skip the value at mas->index/mas->last. This showed up as a bug in mprotect which skips the actual VMA at the current range after attempting to go to the previous VMA from 0. Since MAS_NONE may already be set when searching for a value that isn't contained within a node, changing the handling of MAS_NONE in mas_find() would make the code more complicated and error prone. Furthermore, there was no way to tell which limit was hit, and thus which action to take (next or the entry at the current range). This solution is to add two states to track what happened with the previous iterator action. This allows for the expected behaviour of the next command to return the correct item (either the item at the range requested, or the next/previous). Tests are also added and updated accordingly. Link: https://lkml.kernel.org/r/20230921181236.509072-3-Liam.Howlett@oracle.com Link: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b Link: https://lore.kernel.org/linux-mm/20230921181236.509072-1-Liam.Howlett@oracle.com/ Fixes: 39193685d585 ("maple_tree: try harder to keep active node with mas_prev()") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Pedro Falcato <pedro.falcato@gmail.com> Closes: https://gist.github.com/heatd/85d2971fae1501b55b6ea401fbbe485b Closes: https://bugs.archlinux.org/task/79656 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29maple_tree: add mas_is_active() to detect in-tree walksLiam R. Howlett
Patch series "maple_tree: Fix mas_prev() state regression". Pedro Falcato retported an mprotect regression [1] which was bisected back to the iterator changes for maple tree. Root cause analysis showed the mas_prev() running off the end of the VMA space (previous from 0) followed by mas_find(), would skip the first value. This patchset introduces maple state underflow/overflow so the sequence of calls on the maple state will return what the user expects. Users who encounter this bug may see mprotect(), userfaultfd_register(), and mlock() fail on VMAs mapped with address 0. This patch (of 2): Instead of constantly checking each possibility of the maple state, create a fast path that will skip over checking unlikely states. Link: https://lkml.kernel.org/r/20230921181236.509072-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230921181236.509072-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29mm: abstract moving to the next PFNMatthew Wilcox (Oracle)
In order to fix the L1TF vulnerability, x86 can invert the PTE bits for PROT_NONE VMAs, which means we cannot move from one PTE to the next by adding 1 to the PFN field of the PTE. This results in the BUG reported at [1]. Abstract advancing the PTE to the next PFN through a pte_next_pfn() function/macro. Link: https://lkml.kernel.org/r/20230920040958.866520-1-willy@infradead.org Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: syzbot+55cc72f8cc3a549119df@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000d099fa0604f03351@google.com [1] Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29Merge tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A series that fixes an involved 'double watch error' deadlock in RBD marked for stable and two cleanups" * tag 'ceph-for-6.6-rc4' of https://github.com/ceph/ceph-client: rbd: take header_rwsem in rbd_dev_refresh() only when updating rbd: decouple parent info read-in from updating rbd_dev rbd: decouple header read-in from updating rbd_dev->header rbd: move rbd_dev_refresh() definition Revert "ceph: make members in struct ceph_mds_request_args_ext a union" ceph: remove unnecessary check for NULL in parse_longname()
2023-09-28Merge tag 'mlx5-updates-2023-09-19' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-09-19 Misc updates for mlx5 driver 1) From Erez, Add support for multicast forwarding to multi destination in bridge offloads with software steering mode (SMFS). 2) From Jianbo, Utilize the maximum aggregated link speed for police action rate. 3) From Moshe, Add a health error syndrome for pci data poisoned 4) From Shay, Enable 4 ports multiport E-switch 5) From Jiri, Trivial SF code cleanup ==================== Link: https://lore.kernel.org/r/20230920063552.296978-1-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-28ata: libata-scsi: Disable scsi device manage_system_start_stopDamien Le Moal
The introduction of a device link to create a consumer/supplier relationship between the scsi device of an ATA device and the ATA port of that ATA device fixes the ordering of system suspend and resume operations. For suspend, the scsi device is suspended first and the ata port after it. This is fine as this allows the synchronize cache and START STOP UNIT commands issued by the scsi disk driver to be executed before the ata port is disabled. For resume operations, the ata port is resumed first, followed by the scsi device. This allows having the request queue of the scsi device to be unfrozen after the ata port resume is scheduled in EH, thus avoiding to see new requests prematurely issued to the ATA device. Since libata sets manage_system_start_stop to 1, the scsi disk resume operation also results in issuing a START STOP UNIT command to the device being resumed so that the device exits standby power mode. However, restoring the ATA device to the active power mode must be synchronized with libata EH processing of the port resume operation to avoid either 1) seeing the start stop unit command being received too early when the port is not yet resumed and ready to accept commands, or after the port resume process issues commands such as IDENTIFY to revalidate the device. In this last case, the risk is that the device revalidation fails with timeout errors as the drive is still spun down. Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume") disabled issuing the START STOP UNIT command to avoid issues with it. But this is incorrect as transitioning a device to the active power mode from the standby power mode set on suspend requires a media access command. The IDENTIFY, READ LOG and SET FEATURES commands executed in libata EH context triggered by the ata port resume operation may thus fail. Fix these synchronization issues is by handling a device power mode transitions for system suspend and resume directly in libata EH context, without relying on the scsi disk driver management triggered with the manage_system_start_stop flag. To do this, the following libata helper functions are introduced: 1) ata_dev_power_set_standby(): This function issues a STANDBY IMMEDIATE command to transitiom a device to the standby power mode. For HDDs, this spins down the disks. This function applies only to ATA and ZAC devices and does nothing otherwise. This function also does nothing for devices that have the ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag set. For suspend, call ata_dev_power_set_standby() in ata_eh_handle_port_suspend() before the port is disabled and frozen. ata_eh_unload() is also modified to transition all enabled devices to the standby power mode when the system is shutdown or devices removed. 2) ata_dev_power_set_active() and This function applies to ATA or ZAC devices and issues a VERIFY command for 1 sector at LBA 0 to transition the device to the active power mode. For HDDs, since this function will complete only once the disk spin up. Its execution uses the same timeouts as for reset, to give the drive enough time to complete spinup without triggering a command timeout. For resume, call ata_dev_power_set_active() in ata_eh_revalidate_and_attach() after the port has been enabled and before any other command is issued to the device. With these changes, the manage_system_start_stop and no_start_on_resume scsi device flags do not need to be set in ata_scsi_dev_config(). The flag manage_runtime_start_stop is still set to allow the sd driver to spinup/spindown a disk through the sd runtime operations. Fixes: 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-28ata: libata-scsi: link ata port and scsi deviceDamien Le Moal
There is no direct device ancestry defined between an ata_device and its scsi device which prevents the power management code from correctly ordering suspend and resume operations. Create such ancestry with the ata device as the parent to ensure that the scsi device (child) is suspended before the ata device and that resume handles the ata device before the scsi device. The parent-child (supplier-consumer) relationship is established between the ata_port (parent) and the scsi device (child) with the function device_add_link(). The parent used is not the ata_device as the PM operations are defined per port and the status of all devices connected through that port is controlled from the port operations. The device link is established with the new function ata_scsi_slave_alloc(), and this function is used to define the ->slave_alloc callback of the scsi host template of all ata drivers. Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com>
2023-09-27timers: Tag (hr)timer softirq as hotplug safeFrederic Weisbecker
Specific stress involving frequent CPU-hotplug operations, such as running rcutorture for example, may trigger the following message: NOHZ tick-stop error: local softirq work is pending, handler #02!!!" This happens in the CPU-down hotplug process, after CPUHP_AP_SMPBOOT_THREADS whose teardown callback parks ksoftirqd, and before the target CPU shuts down through CPUHP_AP_IDLE_DEAD. In this fragile intermediate state, softirqs waiting for threaded handling may be forever ignored and eventually reported by the idle task as in the above example. However some vectors are known to be safe as long as the corresponding subsystems have teardown callbacks handling the migration of their events. The above error message reports pending timers softirq although this vector can be considered as hotplug safe because the CPUHP_TIMERS_PREPARE teardown callback performs the necessary migration of timers after the death of the CPU. Hrtimers also have a similar hotplug handling. Therefore this error message, as far as (hr-)timers are concerned, can be considered spurious and the relevant softirq vectors can be marked as hotplug safe. Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230912104406.312185-6-frederic@kernel.org
2023-09-27swiotlb: fix the check whether a device has used software IO TLBPetr Tesarik
When CONFIG_SWIOTLB_DYNAMIC=y, devices which do not use the software IO TLB can avoid swiotlb lookup. A flag is added by commit 1395706a1490 ("swiotlb: search the software IO TLB only if the device makes use of it"), the flag is correctly set, but it is then never checked. Add the actual check here. Note that this code is an alternative to the default pool check, not an additional check, because: 1. swiotlb_find_pool() also searches the default pool; 2. if dma_uses_io_tlb is false, the default swiotlb pool is not used. Tested in a KVM guest against a QEMU RAM-backed SATA disk over virtio and *not* using software IO TLB, this patch increases IOPS by approx 2% for 4-way parallel I/O. The write memory barrier in swiotlb_dyn_alloc() is not needed, because a newly allocated pool must always be observed by swiotlb_find_slots() before an address from that pool is passed to is_swiotlb_buffer(). Correctness was verified using the following litmus test: C swiotlb-new-pool (* * Result: Never * * Check that a newly allocated pool is always visible when the * corresponding swiotlb buffer is visible. *) { mem_pools = default; } P0(int **mem_pools, int *pool) { /* add_mem_pool() */ WRITE_ONCE(*pool, 999); rcu_assign_pointer(*mem_pools, pool); } P1(int **mem_pools, int *flag, int *buf) { /* swiotlb_find_slots() */ int *r0; int r1; rcu_read_lock(); r0 = READ_ONCE(*mem_pools); r1 = READ_ONCE(*r0); rcu_read_unlock(); if (r1) { WRITE_ONCE(*flag, 1); smp_mb(); } /* device driver (presumed) */ WRITE_ONCE(*buf, r1); } P2(int **mem_pools, int *flag, int *buf) { /* device driver (presumed) */ int r0 = READ_ONCE(*buf); /* is_swiotlb_buffer() */ int r1; int *r2; int r3; smp_rmb(); r1 = READ_ONCE(*flag); if (r1) { /* swiotlb_find_pool() */ rcu_read_lock(); r2 = READ_ONCE(*mem_pools); r3 = READ_ONCE(*r2); rcu_read_unlock(); } } exists (2:r0<>0 /\ 2:r3=0) (* Not found. *) Fixes: 1395706a1490 ("swiotlb: search the software IO TLB only if the device makes use of it") Reported-by: Jonathan Corbet <corbet@lwn.net> Closes: https://lore.kernel.org/linux-iommu/87a5uz3ob8.fsf@meer.lwn.net/ Signed-off-by: Petr Tesarik <petr@tesarici.cz> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-09-25wifi: ieee80211: add UL-bandwidth definition of trigger framePo-Hao Huang
Define UL-bandwidth values of trigger frame according to 802.11 std. Signed-off-by: Po-Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://lore.kernel.org/r/20230925080902.51449-2-pkshih@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: support antenna control in injectionJohannes Berg
Support antenna control for injection by parsing the antenna radiotap field (which may be presented multiple times) and telling the driver about the resulting antenna bitmap. Of course there's no guarantee the driver will actually honour this, just like any other injection control. If misconfigured, i.e. the injected HT/VHT MCS needs more chains than antennas are configured, the bitmap is reset to zero, indicating no selection. For now this is only set up for two anntenas so we keep more free bits, but that can be trivially extended if any driver implements support for it that can deal with hardware with more antennas. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.f71001aa4da9.I00ccb762a806ea62bc3d728fa3a0d29f4f285eeb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: add support for parsing TID to Link mapping elementAyala Beker
Add the relevant definitions for TID to Link mapping element according to the P802.11be_D4.0. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.9ea9b0b4412a.I2281ab2c70e8b43a39032dc115db6a80f1f0b3f4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25wifi: mac80211: use bandwidth indication element for CSAJohannes Berg
In CSA, parse the (EHT) bandwidth indication element and use it (in fact prefer it if present). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230920211508.43ef01920556.If4f24a61cd634ab1e50eba43899b9e992bf25602@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-09-25ata: libata-sata: increase PMP SRST timeout to 10sMatthias Schiffer
On certain SATA controllers, softreset fails after wakeup from S2RAM with the message "softreset failed (1st FIS failed)", sometimes resulting in drives not being detected again. With the increased timeout, this issue is avoided. Instead, "softreset failed (device not ready)" is now logged 1-2 times; this later failure seems to cause fewer problems however, and the drives are detected reliably once they've spun up and the probe is retried. The issue was observed with the primary SATA controller of the QNAP TS-453B, which is an "Intel Corporation Celeron/Pentium Silver Processor SATA Controller [8086:31e3] (rev 06)" integrated in the Celeron J4125 CPU, and the following drives: - Seagate IronWolf ST12000VN0008 - Seagate IronWolf ST8000NE0004 The SATA controller seems to be more relevant to this issue than the drives, as the same drives are always detected reliably on the secondary SATA controller on the same board (an ASMedia 106x) without any "softreset failed" errors even without the increased timeout. Fixes: e7d3ef13d52a ("libata: change drive ready wait after hard reset to 5s") Cc: stable@vger.kernel.org Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-09-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix EL2 Stage-1 MMIO mappings where a random address was used - Fix SMCCC function number comparison when the SVE hint is set RISC-V: - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test x86: - Fixes for TSC_AUX virtualization - Stop zapping page tables asynchronously, since we don't zap them as often as before" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX KVM: SVM: Fix TSC_AUX virtualization setup KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe() KVM: x86/mmu: Open code leaf invalidation from mmu_notifier KVM: riscv: selftests: Selectively filter-out AIA registers KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers KVM: selftests: Assert that vasprintf() is successful KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()
2023-09-24Merge tag 'cxl-fixes-6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "A collection of regression fixes, bug fixes, and some small cleanups to the Compute Express Link code. The regressions arrived in the v6.5 dev cycle and missed the v6.6 merge window due to my personal absences this cycle. The most important fixes are for scenarios where the CXL subsystem fails to parse valid region configurations established by platform firmware. This is important because agreement between OS and BIOS on the CXL configuration is fundamental to implementing "OS native" error handling, i.e. address translation and component failure identification. Other important fixes are a driver load error when the BIOS lets the Linux PCI core handle AER events, but not CXL memory errors. The other fixex might have end user impact, but for now are only known to trigger in our test/emulation environment. Summary: - Fix multiple scenarios where platform firmware defined regions fail to be assembled by the CXL core. - Fix a spurious driver-load failure on platforms that enable OS native AER, but not OS native CXL error handling. - Fix a regression detecting "poison" commands when "security" commands are also defined. - Fix a cxl_test regression with the move to centralize CXL port register enumeration in the CXL core. - Miscellaneous small fixes and cleanups" * tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/acpi: Annotate struct cxl_cxims_data with __counted_by cxl/port: Fix cxl_test register enumeration regression cxl/region: Refactor granularity select in cxl_port_setup_targets() cxl/region: Match auto-discovered region decoders by HPA range cxl/mbox: Fix CEL logic for poison and security commands cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native() PCI/AER: Export pcie_aer_is_native() cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
2023-09-23Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three are cc:stable" * tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: nommu: fix empty /proc/<pid>/maps filemap: add filemap_map_order0_folio() to handle order0 folio proc: nommu: /proc/<pid>/maps: release mmap read lock mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement pidfd: prevent a kernel-doc warning argv_split: fix kernel-doc warnings scatterlist: add missing function params to kernel-doc selftests/proc: fixup proc-empty-vm test after KSM changes revert "scripts/gdb/symbols: add specific ko module load command" selftests: link libasan statically for tests with -fsanitize=address task_work: add kerneldoc annotation for 'data' argument mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-23Merge tag 'loongarch-fixes-6.6-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix lockdep, fix a boot failure, fix some build warnings, fix document links, and some cleanups" * tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: docs/zh_CN/LoongArch: Update the links of ABI docs/LoongArch: Update the links of ABI LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem() kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage LoongArch: Set all reserved memblocks on Node#0 at initialization LoongArch: Remove dead code in relocate_new_kernel LoongArch: Use _UL() and _ULL() LoongArch: Fix some build warnings with W=1 LoongArch: Fix lockdep static memory detection