summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-10-15net: phy: support 'active-high' property for PHY LEDsDaniel Golle
In addition to 'active-low' and 'inactive-high-impedance' also support 'active-high' property for PHY LED pin configuration. As only either 'active-high' or 'active-low' can be set at the same time, WARN and return an error in case both are set. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/91598487773d768f254d5faf06cf65b13e972f0e.1728558223.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-14net: napi: Add napi_configJoe Damato
Add a persistent NAPI config area for NAPI configuration to the core. Drivers opt-in to setting the persistent config for a NAPI by passing an index when calling netif_napi_add_config. napi_config is allocated in alloc_netdev_mqs, freed in free_netdev (after the NAPIs are deleted). Drivers which call netif_napi_add_config will have persistent per-NAPI settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings. Per-NAPI settings are saved in napi_disable and restored in napi_enable. Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14net: napi: Make gro_flush_timeout per-NAPIJoe Damato
Allow per-NAPI gro_flush_timeout setting. The existing sysfs parameter is respected; writes to sysfs will write to all NAPI structs for the device and the net_device gro_flush_timeout field. Reads from sysfs will read from the net_device field. The ability to set gro_flush_timeout on specific NAPI instances will be added in a later commit, via netdev-genl. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14net: napi: Make napi_defer_hard_irqs per-NAPIJoe Damato
Add defer_hard_irqs to napi_struct in preparation for per-NAPI settings. The existing sysfs parameter is respected; writes to sysfs will write to all NAPI structs for the device and the net_device defer_hard_irq field. Reads from sysfs show the net_device field. The ability to set defer_hard_irqs on specific NAPI instances will be added in a later commit, via netdev-genl. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-14net: add TIME_WAIT logic to sk_to_full_sk()Eric Dumazet
TCP will soon attach TIME_WAIT sockets to some ACK and RST. Make sure sk_to_full_sk() detects this and does not return a non full socket. v3: also changed sk_const_to_full_sk() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Brian Vazquez <brianvv@google.com> Link: https://patch.msgid.link/20241010174817.1543642-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-13net: skb: add pskb_network_may_pull_reason() helperMenglong Dong
Introduce the function pskb_network_may_pull_reason() and make pskb_network_may_pull() a simple inline call to it. The drop reasons of it just come from pskb_may_pull_reason. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10Merge branch 'net-introduce-tx-h-w-shaping-api'Jakub Kicinski
Paolo Abeni says: ==================== net: introduce TX H/W shaping API We have a plurality of shaping-related drivers API, but none flexible enough to meet existing demand from vendors[1]. This series introduces new device APIs to configure in a flexible way TX H/W shaping. The new functionalities are exposed via a newly defined generic netlink interface and include introspection capabilities. Some self-tests are included, on top of a dummy netdevsim implementation. Finally a basic implementation for the iavf driver is provided. Some usage examples: * Configure shaping on a given queue: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do set --json '{"ifindex": '$IFINDEX', "shaper": {"handle": {"scope": "queue", "id":'$QUEUEID'}, "bw-max": 2000000}}' * Container B/W sharing The orchestration infrastructure wants to group the container-related queues under a RR scheduling and limit the aggregate bandwidth: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope":"node"}, "bw-max": 10000000}' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}} Q1 \ \ Q2 -- node 0 ------- netdev / (bw-max: 10M) Q3 / * Delegation A containers wants to limit the aggregate B/W bandwidth of 2 of the 3 queues it owns - the starting configuration is the one from the previous point: SPEC=Documentation/netlink/specs/net_shaper.yaml ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID2'}, "weight": '$W2'}], "handle": {"scope": "node"}, "bw-max": 5000000 }' {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}} Q1 -- node 1 --------\ / (bw-max: 5M) \ Q2 / node 0 ------- netdev /(bw-max: 10M) Q3 ------------------/ In a group operation, when prior to the op itself, the leaves have different parents, the user must specify the parent handle for the group. I.e., starting from the previous config: ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "bw-max": 3000000 }' Netlink error: Invalid argument nl_len = 96 (80) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'All the leaves shapers must have the same old parent'} ./tools/net/ynl/cli.py --spec $SPEC \ --do group --json '{"ifindex": '$IFINDEX', "leaves": [ {"handle": {"scope": "queue", "id":'$QID1'}, "weight": '$W1'}, {"handle": {"scope": "queue", "id":'$QID3'}, "weight": '$W3'}], "handle": {"scope": "node"}, "parent": {"scope": "node", "id": 1}, "bw-max": 3000000 } {'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}} Q1 -- node 2 --- /(bw-max:3M)\ Q3 / \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) * Cleanup: Still starting from config 1To delete a single queue shaper ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID3'}}' Q1 -- node 2 --- (bw-max:3M)\ \ ---- node 1 \ / (bw-max: 5M)\ Q2 node 0 ------- netdev (bw-max: 10M) Deleting a node shaper relinks all its leaves to the node's parent: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id":2}}' Q1 ---\ \ node 1----- \ / (bw-max: 5M)\ Q2----/ node 0 ------- netdev (bw-max: 10M) Deleting the last shaper under a node shaper deletes the node, too: ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID1'}}' ./tools/net/ynl/cli.py --spec $SPEC --do delete --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "queue", "id":'$QID2'}}' ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 1}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} Such delete recurses on parents that are left over with no leaves: ./tools/net/ynl/cli.py --spec $SPEC --do get --json \ '{"ifindex": '$IFINDEX', "handle": {"scope": "node", "id": 0}}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.handle'} v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com ==================== Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10virtchnl: support queue rate limit and quanta size configurationWenjun Wu
This patch adds new virtchnl opcodes and structures for rate limit and quanta size configuration, which include: 1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each VF per queue. 2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue. 3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The configuration is previously set by DCB and PF, and now is the potential QoS capability of VF. VF can take it as reference to configure queue TC mapping. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/839002f7bd6f63b985a060a51b079f6e6dbbe237.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net-shapers: implement NL get operationPaolo Abeni
Introduce the basic infrastructure to implement the net-shaper core functionality. Each network devices carries a net-shaper cache, the NL get() operation fetches the data from such cache. The cache is initially empty, will be fill by the set()/group() operation implemented later and is destroyed at device cleanup time. The net_shaper_fill_handle(), net_shaper_ctx_init(), and net_shaper_generic_pre() implementations handle generic index type attributes, despite the current caller always pass a constant value to avoid more noise in later patches using them with different attributes. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ddd10fd645a9367803ad02fca4a5664ea5ace170.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10genetlink: extend info user-storage to match NL cb ctxPaolo Abeni
This allows a more uniform implementation of non-dump and dump operations, and will be used later in the series to avoid some per-operation allocation. Additionally rename the NL_ASSERT_DUMP_CTX_FITS macro, to fit a more extended usage. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1130cc2896626b84587a2a5f96a5c6829638f4da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10net/mlx5: qos: Flesh out element_attributes in mlx5_ifc.hCosmin Ratiu
This is used for multiple purposes, depending on the scheduling element created. There are a few helper struct defined a long time ago, but they are not easy to find in the file and they are about to get new members. This commit cleans up this area a bit by: - moving the helper structs closer to where they are relevant. - defining a helper union to include all of them to help discoverability. - making use of it everywhere element_attributes is used. - using a consistent 'attr' name. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-09ipv4: Retire global IPv4 hash table inet_addr_lst.Kuniyuki Iwashima
No one uses inet_addr_lst anymore, so let's remove it. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241008172906.1326-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09ipv4: Link IPv4 address to per-netns hash table.Kuniyuki Iwashima
As a prep for per-netns RTNL conversion, we want to namespacify the IPv4 address hash table and the GC work. Let's allocate the per-netns IPv4 address hash table to net->ipv4.inet_addr_lst and link IPv4 addresses into it. The actual users will be converted later. Note that the IPv6 address hash table is already namespacified. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241008172906.1326-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"Michal Hocko
This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4. There is no existing user of those flags. PF_MEMALLOC_NOWARN is dangerous because a nested allocation context can use GFP_NOFAIL which could cause unexpected failure. Such a code would be hard to maintain because it could be deeper in the call chain. PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] that such a allocation contex is inherently unsafe if the context doesn't fully control all allocations called from this context. While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM is it doesn't have any user and as Matthew has pointed out we are running out of those flags so better reclaim it without any real users. [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/ Link: https://lkml.kernel.org/r/20240926172940.167084-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: James Morris <jmorris@namei.org> Cc: Jan Kara <jack@suse.cz> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09bcachefs: do not use PF_MEMALLOC_NORECLAIMMichal Hocko
Patch series "remove PF_MEMALLOC_NORECLAIM" v3. This patch (of 2): bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new inode to achieve GFP_NOWAIT semantic while holding locks. If this allocation fails it will drop locks and use GFP_NOFS allocation context. We would like to drop PF_MEMALLOC_NORECLAIM because it is really dangerous to use if the caller doesn't control the full call chain with this flag set. E.g. if any of the function down the chain needed GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and cause unexpected failure. While this is not the case in this particular case using the scoped gfp semantic is not really needed bacause we can easily pus the allocation context down the chain without too much clutter. [akpm@linux-foundation.org: fix kerneldoc warnings] Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> # For vfs changes Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: James Morris <jmorris@namei.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Paul Moore <paul@paul-moore.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09net: pcs: xpcs: move definition of struct dw_xpcs to private headerRussell King (Oracle)
There should be no reason for anything outside the XPCS code to know the contents of struct dw_xpcs - this is a private structure to XPCS. Move the definition to the private pcs-xpcs.h header, leaving a declaration in the global pcs/pcs-xpcs.h Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-09net: pcs: xpcs: provide a helper to get the phylink pcs given xpcsRussell King (Oracle)
Provide a helper to provide the pointer to the phylink_pcs struct given a valid xpcs pointer. This will be necessary when we make struct dw_xpcs private to pcs-xpcs.c Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-08rtnetlink: Add assertion helpers for per-netns RTNL.Kuniyuki Iwashima
Once an RTNL scope is converted with rtnl_net_lock(), we will replace RTNL helper functions inside the scope with the following per-netns alternatives: ASSERT_RTNL() -> ASSERT_RTNL_NET(net) rcu_dereference_rtnl(p) -> rcu_dereference_rtnl_net(net, p) Note that the per-netns helpers are equivalent to the conventional helpers unless CONFIG_DEBUG_NET_SMALL_RTNL is enabled. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08rtnetlink: Add per-netns RTNL.Kuniyuki Iwashima
The goal is to break RTNL down into per-netns mutex. This patch adds per-netns mutex and its helper functions, rtnl_net_lock() and rtnl_net_unlock(). rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and rtnl_net_unlock() releases them. We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes rtnl_lock() in rtnl_net_lock(). When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(), and its locking order is defined by rtnl_net_lock_cmp_fn() as follows: 1. init_net is first 2. netns address ascending order Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL with LOCKDEP so that we can carefully add the extra mutex without slowing down RTNL operations during conversion. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08Revert "rtnetlink: add guard for RTNL"Kuniyuki Iwashima
This reverts commit 464eb03c4a7cfb32cb3324249193cf6bb5b35152. Once we have a per-netns RTNL, we won't use guard(rtnl). Also, there's no users for now. $ grep -rnI "guard(rtnl" || true $ Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+KoYzUH+VPLdGmLABYf5y4TW0hrM4UAeQQJ9AREty0iw@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-08net: phy: Add support for PHY timing-role configuration via device treeOleksij Rempel
Introduce support for configuring the master/slave role of PHYs based on the `timing-role` property in the device tree. While this functionality is necessary for Single Pair Ethernet (SPE) PHYs (1000/100/10Base-T1) where hardware strap pins may be unavailable or incorrectly set, it works for any PHY type. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-07ipv4: remove fib_info_devhash[]Eric Dumazet
Upcoming per-netns RTNL conversion needs to get rid of shared hash tables. fib_info_devhash[] is one of them. It is unclear why we used a hash table, because a single hlist_head per net device was cheaper and scalable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net: add IFLA_MAX_PACING_OFFLOAD_HORIZON device attributeEric Dumazet
Some network devices have the ability to offload EDT (Earliest Departure Time) which is the model used for TCP pacing and FQ packet scheduler. Some of them implement the timing wheel mechanism described in https://saeed.github.io/files/carousel-sigcomm17.pdf with an associated 'timing wheel horizon'. This patch adds dev->max_pacing_offload_horizon expressing this timing wheel horizon in nsec units. This is a read-only attribute. Unless a driver sets it, dev->max_pacing_offload_horizon is zero. v2: addressed Jakub feedback ( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 ) v3: added yaml doc (also per Jakub feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20241003121219.2396589-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'pm-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two cpufreq issues, one in the core and one in the intel_pstate driver: - Fix CPU device node reference counting in the cpufreq core (Miquel Sabaté Solà) - Turn the spinlock used by the intel_pstate driver in hard IRQ context into a raw one to prevent the driver from crashing when PREEMPT_RT is enabled (Uwe Kleine-König)" * tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid a bad reference count on CPU node cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
2024-10-04net/mlx5: hw counters: Remove mlx5_fc_create_exCosmin Ratiu
It no longer serves any purpose and is identical to mlx5_fc_create upon which it was originally based of. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net/mlx5: hw counters: Make fc_stats & fc_pool privateCosmin Ratiu
The mlx5_fc_stats and mlx5_fc_pool structs are only used from fs_counters.c. As such, make them private there. mlx5_fc_pool is not used or referenced at all outside fs_counters. mlx5_fc_stats is referenced from mlx5_core_dev, so instead of having it as a direct member (which requires exporting it from fs_counters), store a pointer to it, allocate it on init and clear it on destroy. One caveat is that a simple container_of to get from a 'work' struct to the outermost mlx5_core_dev struct directly no longer works, so an extra pointer had to be added to mlx5_fc_stats back to the parent dev. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241001103709.58127-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes, xe and amdgpu lead the way, with panthor, and few core components getting various fixes. Nothing seems too out of the ordinary. atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size sched: - Update maintainers - Fix race condition whne queueing up jobs - Fix locking in drm_sched_entity_modify_sched() - Fix pointer deref if entity queue changes sysfb: - Disable sysfb if framebuffer parent device is unknown amdgpu: - DML2 fix - DSC fix - Dispclk fix - eDP HDR fix - IPS fix - TBT fix i915: - One fix for bitwise and logical "and" mixup in PM code xe: - Restore pci state on resume - Fix locking on submission, queue and vm - Fix UAF on queue destruction - Fix resource release on freq init error path - Use rw_semaphore to reduce contention on ASID->VM lookup - Fix steering for media on Xe2_HPM - Tuning updates to Xe2 - Resume TDR after GT reset to prevent jobs running forever - Move id allocation to avoid userspace using a guessed number to trigger UAF - Fix OA stream close preventing pbatch buffers to complete - Fix NPD when migrating memory on LNL - Fix memory leak when aborting binds panthor: - Fix locking - Set FOP_UNSIGNED_OFFSET in fops instance - Acquire lock in panthor_vm_prepare_map_op_ctx() - Avoid uninitialized variable in tick_ctx_cleanup() - Do not block scheduler queue if work is pending - Do not add write fences to the shared BOs vbox: - Fix VLA handling" * tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits) drm/xe: Fix memory leak when aborting binds drm/xe: Prevent null pointer access in xe_migrate_copy drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close drm/xe/queue: move xa_alloc to prevent UAF drm/xe/vm: move xa_alloc to prevent UAF drm/xe: Clean up VM / exec queue file lock usage. drm/xe: Resume TDR after GT reset drm/xe/xe2: Add performance tuning for L3 cache flushing drm/xe/xe2: Extend performance tuning to media GT drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM drm/xe: Use helper for ASID -> VM in GPU faults and access counters drm/xe: Convert to USM lock to rwsem drm/xe: use devm_add_action_or_reset() helper drm/xe: fix UAF around queue destruction drm/xe/guc_submit: add missing locking in wedged_fini drm/xe: Restore pci state upon resume drm/amd/display: Fix system hang while resume with TBT monitor drm/amd/display: Enable idle workqueue for more IPS modes drm/amd/display: Add HDR workaround for specific eDP drm/amd/display: avoid set dispclk to 0 ...
2024-10-04Merge tag 'fsnotify_for_v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Fixes for an inotify deadlock and a data race in fsnotify" * tag 'fsnotify_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Fix possible deadlock in fsnotify_destroy_mark fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()
2024-10-04Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull close_range() fix from Al Viro: "Fix the logic in descriptor table trimming" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: close_range(): fix the logics in descriptor table trimming
2024-10-04appletalk: Remove deadcodeDr. David Alan Gilbert
alloc_ltalkdev in net/appletalk/dev.c is dead since commit 00f3696f7555 ("net: appletalk: remove cops support") Removing it (and it's helper) leaves dev.c and if_ltalk.h empty; remove them and the Makefile entry. tun.c was including that if_ltalk.h but actually wanted the uapi version for LTALK_ALEN, fix up the path. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-03lib: packing: duplicate pack() and unpack() implementationsVladimir Oltean
packing() is now used in some hot paths, and it would be good to get rid of some ifs and buts that depend on "op", to speed things up a little bit. With the main implementations now taking size_t endbit, we no longer have to check for negative values. Update the local integer variables to also be size_t to match. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-5-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03lib: packing: add pack() and unpack() wrappers over packing()Vladimir Oltean
Geert Uytterhoeven described packing() as "really bad API" because of not being able to enforce const correctness. The same function is used both when "pbuf" is input and "uval" is output, as in the other way around. Create 2 wrapper functions where const correctness can be ensured. Do ugly type casts inside, to be able to reuse packing() as currently implemented - which will _not_ modify the input argument. Also, take the opportunity to change the type of startbit and endbit to size_t - an unsigned type - in these new function prototypes. When int, an extra check for negative values is necessary. Hopefully, when packing() goes away completely, that check can be dropped. My concern is that code which does rely on the conditional directionality of packing() is harder to refactor without blowing up in size. So it may take a while to completely eliminate packing(). But let's make alternatives available for those who do not need that. Link: https://lore.kernel.org/netdev/20210223112003.2223332-1-geert+renesas@glider.be/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-4-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03lib: packing: remove kernel-doc from header fileVladimir Oltean
It is not necessary to have the kernel-doc duplicated both in the header and in the implementation. It is better to have it near the implementation of the function, since in C, a function can have N declarations, but only one definition. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-3-8373e551eae3@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03Merge tag 'net-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from ieee802154, bluetooth and netfilter. Current release - regressions: - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc - eth: am65-cpsw: fix forever loop in cleanup code Current release - new code bugs: - eth: mlx5: HWS, fixed double-free in error flow of creating SQ Previous releases - regressions: - core: avoid potential underflow in qdisc_pkt_len_init() with UFO - core: test for not too small csum_start in virtio_net_hdr_to_skb() - vrf: revert "vrf: remove unnecessary RCU-bh critical section" - bluetooth: - fix uaf in l2cap_connect - fix possible crash on mgmt_index_removed - dsa: improve shutdown sequence - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq - eth: ip_gre: fix drops of small packets in ipgre_xmit Previous releases - always broken: - core: fix gso_features_check to check for both dev->gso_{ipv4_,}max_size - core: fix tcp fraglist segmentation after pull from frag_list - netfilter: nf_tables: prevent nf_skb_duplicated corruption - sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start - mac802154: fix potential RCU dereference issue in mac802154_scan_worker - eth: fec: restart PPS after link state change" * tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits) sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems doc: net: napi: Update documentation for napi_schedule_irqoff net/ncsi: Disable the ncsi work before freeing the associated structure net: phy: qt2025: Fix warning: unused import DeviceId gso: fix udp gso fraglist segmentation after pull from frag_list bridge: mcast: Fail MDB get request on empty entry vrf: revert "vrf: Remove unnecessary RCU-bh critical section" net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code net: phy: realtek: Check the index value in led_hw_control_get ppp: do not assume bh is held in ppp_channel_bridge_input() selftests: rds: move include.sh to TEST_FILES net: test for not too small csum_start in virtio_net_hdr_to_skb() net: gso: fix tcp fraglist segmentation after pull from frag_list ipv4: ip_gre: Fix drops of small packets in ipgre_xmit net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check net: add more sanity checks to qdisc_pkt_len_init() net: avoid potential underflow in qdisc_pkt_len_init() with UFO net: ethernet: ti: cpsw_ale: Fix warning on some platforms net: microchip: Make FDMA config symbol invisible ...
2024-10-03Merge tag 'vfs-6.12-rc2.fixes.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "vfs: - Ensure that iter_folioq_get_pages() advances to the next slot otherwise it will end up using the same folio with an out-of-bound offset. iomap: - Dont unshare delalloc extents which can't be reflinked, and thus can't be shared. - Constrain the file range passed to iomap_file_unshare() directly in iomap instead of requiring the callers to do it. netfs: - Use folioq_count instead of folioq_nr_slot to prevent an unitialized value warning in netfs_clear_buffer(). - Fix missing wakeup after issuing writes by scheduling the write collector only if all the subrequest queues are empty and thus no writes are pending. - Fix two minor documentation bugs" * tag 'vfs-6.12-rc2.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: constrain the file range passed to iomap_file_unshare iomap: don't bother unsharing delalloc extents netfs: Fix missing wakeup after issuing writes Documentation: add missing folio_queue entry folio_queue: fix documentation netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer iov_iter: fix advancing slot in iter_folioq_get_pages()
2024-10-02net: pcs: xpcs: make xpcs_do_config() and xpcs_link_up() internalRussell King (Oracle)
As nothing outside pcs-xpcs.c calls neither xpcs_do_config() nor xpcs_link_up(), remove their exports and prototypes. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMv-005ZIv-2M@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02net: pcs: xpcs: drop interface argument from xpcs_create*()Russell King (Oracle)
The XPCS sub-driver no longer uses the "interface" argument to the xpcs_create_mdiodev() and xpcs_create_fwnode() functions. Remove this now unnecessary argument, updating the stmmac driver appropriately. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMp-005ZIp-UX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02net: pcs: xpcs: add xpcs_destroy_pcs() and xpcs_create_pcs_mdiodev()Russell King (Oracle)
Provide xpcs create/destroy functions that return and take a phylink_pcs pointer instead of an xpcs pointer. This will be used by drivers that have been converted to use phylink_pcs pointers internally, rather than dw_xpcs pointers. As xpcs_create_mdiodev() no longer makes use of its interface argument, pass PHY_INTERFACE_MODE_NA into xpcs_create_mdiodev() until it is removed later in the series. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1svfMQ-005ZIL-Bi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02net: pcs: xpcs: move PCS reset to .pcs_pre_config()Russell King (Oracle)
Move the PCS reset to .pcs_pre_config() rather than at creation time, which means we call the reset function with the interface that we're actually going to be using to talk to the downstream device. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # sja1105 Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: for them? Link: https://patch.msgid.link/E1svfMA-005ZI3-Va@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02net: test for not too small csum_start in virtio_net_hdr_to_skb()Eric Dumazet
syzbot was able to trigger this warning [1], after injecting a malicious packet through af_packet, setting skb->csum_start and thus the transport header to an incorrect value. We can at least make sure the transport header is after the end of the network header (with a estimated minimal size). [1] [ 67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0 mac=(-1,-1) mac_len=0 net=(16,-6) trans=10 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9 [ 67.877764] sk family=17 type=3 proto=0 [ 67.878279] skb linear: 00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00 [ 67.879128] skb frag: 00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02 [ 67.879877] skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.880647] skb frag: 00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00 [ 67.881156] skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.881753] skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882173] skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.882790] skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883171] skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.883733] skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.884206] skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e [ 67.884704] skb frag: 000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00 [ 67.885139] skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.885677] skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886042] skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.886408] skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887020] skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 67.887384] skb frag: 00000100: 00 00 [ 67.887878] ------------[ cut here ]------------ [ 67.887908] offset (-6) >= skb_headlen() (14) [ 67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs [ 67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011 [ 67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891043] Call Trace: [ 67.891173] <TASK> [ 67.891274] ? __warn (kernel/panic.c:741) [ 67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 67.891348] ? handle_bug (arch/x86/kernel/traps.c:239) [ 67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2)) [ 67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1)) [ 67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113) [ 67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200) [ 67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13)) [ 67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1)) [ 67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670) [ 67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227) [ 67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604) [ 67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25)) [ 67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1)) [ 67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4)) [ 67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657) [ 67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3)) [ 67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 67.891734] ? do_sock_setsockopt (net/socket.c:2335) [ 67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Fixes: 9181d6f8a2bb ("net: add more sanity check in virtio_net_hdr_to_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02Merge tag 'mlx5-fixes-2024-09-25' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-09-25 * tag 'mlx5-fixes-2024-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix crash caused by calling __xfrm_state_delete() twice net/mlx5e: SHAMPO, Fix overflow of hd_per_wq net/mlx5: HWS, changed E2BIG error to a negative return code net/mlx5: HWS, fixed double-free in error flow of creating SQ net/mlx5: Fix wrong reserved field in hca_cap_2 in mlx5_ifc net/mlx5e: Fix NULL deref in mlx5e_tir_builder_alloc() net/mlx5: Added cond_resched() to crdump collection net/mlx5: Fix error path in multi-packet WQE transmit ==================== Link: https://patch.msgid.link/20240925202013.45374-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02Merge tag 'pull-work.unaligned' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull generic unaligned.h cleanups from Al Viro: "Get rid of architecture-specific <asm/unaligned.h> includes, replacing them with a single generic <linux/unaligned.h> header file. It's the second largest (after asm/io.h) class of asm/* includes, and all but two architectures actually end up using exact same file. Massage the remaining two (arc and parisc) to do the same and just move the thing to from asm-generic/unaligned.h to linux/unaligned.h" [ This is one of those things that we're better off doing outside the merge window, and would only cause extra conflict noise if it was in linux-next for the next release due to all the trivial #include line updates. Rip off the band-aid. - Linus ] * tag 'pull-work.unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move asm/unaligned.h to linux/unaligned.h arc: get rid of private asm/unaligned.h parisc: get rid of private asm/unaligned.h
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02inotify: Fix possible deadlock in fsnotify_destroy_markLizhi Xu
[Syzbot reported] WARNING: possible circular locking dependency detected 6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted ------------------------------------------------------ kswapd0/78 is trying to acquire lock: ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 but task is already holding lock: ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline] ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: ... kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044 inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline] inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline] __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline] __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&group->mark_mutex){+.+.}-{3:3}: ... __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline] fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578 fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934 fsnotify_inoderemove include/linux/fsnotify.h:264 [inline] dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403 __dentry_kill+0x20d/0x630 fs/dcache.c:610 shrink_kill+0xa9/0x2c0 fs/dcache.c:1055 shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082 prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163 super_cache_scan+0x34f/0x4b0 fs/super.c:221 do_shrink_slab+0x701/0x1160 mm/shrinker.c:435 shrink_slab+0x1093/0x14d0 mm/shrinker.c:662 shrink_one+0x43b/0x850 mm/vmscan.c:4815 shrink_many mm/vmscan.c:4876 [inline] lru_gen_shrink_node mm/vmscan.c:4954 [inline] shrink_node+0x3799/0x3de0 mm/vmscan.c:5934 kswapd_shrink_node mm/vmscan.c:6762 [inline] balance_pgdat mm/vmscan.c:6954 [inline] kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223 [Analysis] The problem is that inotify_new_watch() is using GFP_KERNEL to allocate new watches under group->mark_mutex, however if dentry reclaim races with unlinking of an inode, it can end up dropping the last dentry reference for an unlinked inode resulting in removal of fsnotify mark from reclaim context which wants to acquire group->mark_mutex as well. This scenario shows that all notification groups are in principle prone to this kind of a deadlock (previously, we considered only fanotify and dnotify to be problematic for other reasons) so make sure all allocations under group->mark_mutex happen with GFP_NOFS. Reported-and-tested-by: syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com
2024-10-01cpufreq: Avoid a bad reference count on CPU nodeMiquel Sabaté Solà
In the parse_perf_domain function, if the call to of_parse_phandle_with_args returns an error, then the reference to the CPU device node that was acquired at the start of the function would not be properly decremented. Address this by declaring the variable with the __free(device_node) cleanup attribute. Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-01folio_queue: fix documentationChristian Brauner
s/folioq_count/folioq_full/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241001134729.3f65ae78@canb.auug.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_sizeDaniel Borkmann
Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") added a dev->gso_max_size test to gso_features_check() in order to fall back to GSO when needed. This was added as it was noticed that some drivers could misbehave if TSO packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size limit. For instance, a device could be configured with BIG TCP for IPv4, but not IPv6. Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size() and use the helper to respect both limits before falling back to GSO engine. Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: Add netif_get_gro_max_size helper for GRODaniel Borkmann
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4 or IPv6 GRO size of the netdevice. We later add a netif_get_gso_max_size() equivalent as well for GSO, so that these helpers can be used consistently instead of open-coded checks. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>