summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-06-14Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next: updates 2022-06-14 1) Updated HW bits and definitions for upcoming features 1.1) vport debug counters 1.2) flow meter 1.3) Execute ASO action for flow entry 1.4) enhanced CQE compression 2) Add ICM header-modify-pattern RDMA API Leon Says ========= SW steering manipulates packet's header using "modifying header" actions. Many of these actions do the same operation, but use different data each time. Currently we create and keep every one of these actions, which use expensive and limited resources. Now we introduce a new mechanism - pattern and argument, which splits a modifying action into two parts: 1. action pattern: contains the operations to be applied on packet's header, mainly set/add/copy of fields in the packet 2. action data/argument: contains the data to be used by each operation in the pattern. This way we reuse same patterns with different arguments to create new modifying actions, and since many actions share the same operations, we end up creating a small number of patterns that we keep in a dedicated cache. These modify header patterns are implemented as new type of ICM memory, so the following kernel patch series add the support for this new ICM type. ========== * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add bits and fields to support enhanced CQE compression net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK net/mlx5: group fdb cleanup to single function net/mlx5: Add support EXECUTE_ASO action for flow entry net/mlx5: Add HW definitions of vport debug counters net/mlx5: Add IFC bits and enums for flow meter RDMA/mlx5: Support handling of modify-header pattern ICM area net/mlx5: Manage ICM of type modify-header pattern net/mlx5: Introduce header-modify-pattern ICM properties ==================== Link: https://lore.kernel.org/r/20220614184028.51548-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13net/mlx5: Add bits and fields to support enhanced CQE compressionOfer Levi
Expose ifc bits and add needed structure fields and methods to support enhanced CQE compression feature. The enhanced CQE compression feature improves cpu utiliziation with better packet latency from nic to host. Signed-off-by: Ofer Levi <oferle@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASKShay Drory
Remove not used MLX5_CAP_BITS_RW_MASK. While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK was their only user. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add support EXECUTE_ASO action for flow entryJianbo Liu
Attach flow meter to FTE with object id and index. Use metadata register C5 to store the packet color meter result. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add HW definitions of vport debug countersSaeed Mahameed
total_q_under_processor_handle - number of queues in error state due to an async error or errored command. send_queue_priority_update_flow - number of QP/SQ priority/SL update events. cq_overrun - number of times CQ entered an error state due to an overflow. async_eq_overrun -number of time an EQ mapped to async events was overrun. comp_eq_overrun - number of time an EQ mapped to completion events was overrun. quota_exceeded_command - number of commands issued and failed due to quota exceeded. invalid_command - number of commands issued and failed dues to any reason other than quota exceeded. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add IFC bits and enums for flow meterJianbo Liu
Add/extend structure layouts and defines for flow meter. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13RDMA/mlx5: Support handling of modify-header pattern ICM areaYevgeny Kliteynik
Add support for allocate/deallocate and registering MR of the new type of ICM area. Support exists only for devices that support sw_owner_v2. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Manage ICM of type modify-header patternYevgeny Kliteynik
Added support for managing new type of ICM for devices that support sw_owner_v2. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Introduce header-modify-pattern ICM propertiesYevgeny Kliteynik
Added new fields for device memory capabilities, in order to support creation of ICM memory for modify header patterns. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net: make __sys_accept4_file() staticYajun Deng
__sys_accept4_file() isn't used outside of the file, make it static. As the same time, move file_flags and nofile parameters into __sys_accept4_file(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13tcp: sk_forced_mem_schedule() optimizationEric Dumazet
sk_memory_allocated_add() has three callers, and returns to them @memory_allocated. sk_forced_mem_schedule() is one of them, and ignores the returned value. Change sk_memory_allocated_add() to return void. Change sock_reserve_memory() and __sk_mem_raise_allocated() to call sk_memory_allocated(). This removes one cache line miss [1] for RPC workloads, as first skbs in TCP write queue and receive queue go through sk_forced_mem_schedule(). [1] Cache line holding tcp_memory_allocated. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-10net: keep sk->sk_forward_alloc as small as possibleEric Dumazet
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast. Each TCP socket can forward allocate up to 2 MB of memory, even after flow became less active. 10,000 sockets can have reserved 20 GB of memory, and we have no shrinker in place to reclaim that. Instead of trying to reclaim the extra allocations in some places, just keep sk->sk_forward_alloc values as small as possible. This should not impact performance too much now we have per-cpu reserves: Changes to tcp_memory_allocated should not be too frequent. For sockets not using SO_RESERVE_MEM: - idle sockets (no packets in tx/rx queues) have zero forward alloc. - non idle sockets have a forward alloc smaller than one page. Note: - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: fix sk_wmem_schedule() and sk_rmem_schedule() errorsEric Dumazet
If sk->sk_forward_alloc is 150000, and we need to schedule 150001 bytes, we want to allocate 1 byte more (rounded up to one page), instead of 150001 :/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: implement per-cpu reserves for memory_allocatedEric Dumazet
We plan keeping sk->sk_forward_alloc as small as possible in future patches. This means we are going to call sk_memory_allocated_add() and sk_memory_allocated_sub() more often. Implement a per-cpu cache of +1/-1 MB, to reduce number of changes to sk->sk_prot->memory_allocated, which would otherwise be cause of false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: add per_cpu_fw_alloc field to struct protoEric Dumazet
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFTEric Dumazet
Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE. This might change in the future, but it seems better to avoid the confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Revert "net: set SK_MEM_QUANTUM to 4096"Eric Dumazet
This reverts commit bd68a2a854ad5a85f0c8d0a9c8048ca3f6391efb. This change broke memcg on arches with PAGE_SIZE != 4096 Later, commit 2bb2f5fb21b04 ("net: add new socket option SO_RESERVE_MEM") also assumed PAGE_SIZE==SK_MEM_QUANTUM Following patches in the series will greatly reduce the over allocations problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge tag 'ata-5.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fixes from Damien Le Moal: "Several small fixes for rc2: - Remove unused field in struct ata_port (Hannes) - Fix a potential (very unlikely) NULL pointer dereference in ata_host_alloc_pinfo() (Sergey) - Fix a device reference leak in the pata_octeon_cf driver (Miaoqian) - Fixes for handling access to the concurrent positioning ranges log page used with multi-actuator HDDs (Tyler) - Fix the values shown by the pio_mode and dma_mode sysfs device attributes (Sergey) - Update the MAINTAINERS file to add libata sysfs ABI documentation file (Sergey)" * tag 'ata-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: MAINTAINERS: add ATA sysfs file documentation to libata entry ata: libata-transport: fix {dma|pio|xfer}_mode sysfs files libata: fix translation of concurrent positioning ranges libata: fix reading concurrent positioning ranges log ata: pata_octeon_cf: Fix refcount leak in octeon_cf_probe ata: libata-core: fix NULL pointer deref in ata_host_alloc_pinfo() ata: libata: drop 'sas_last_tag'
2022-06-10Merge tag 'net-5.19-rc2-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Quick follow up, to cleanly fast-forward net again. Current release - new code bugs: - Revert "net/mlx5e: Allow relaxed ordering over VFs" Previous releases - regressions: - seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdev Misc: - rename TLS_INFO_ZC_SENDFILE to better express the meaning" * tag 'net-5.19-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdev nfp: flower: restructure flow-key for gre+vlan combination nfp: avoid unnecessary check warnings in nfp_app_get_vf_config tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TX net/mlx5: fs, fail conflicting actions net/mlx5: Rearm the FW tracer after each tracer event net/mlx5: E-Switch, pair only capable devices net/mlx5e: CT: Fix cleanup of CT before cleanup of TC ct rules Revert "net/mlx5e: Allow relaxed ordering over VFs" MAINTAINERS: adjust MELLANOX ETHERNET INNOVA DRIVERS to TLS support removal
2022-06-10Merge tag 'for-linus-5.19a-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a small cleanup removing "export" of an __init function - a small series adding a new infrastructure for platform flags - a series adding generic virtio support for Xen guests (frontend side) * tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: unexport __init-annotated xen_xlate_map_ballooned_pages() arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices xen/grant-dma-iommu: Introduce stub IOMMU driver dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops xen/virtio: Enable restricted memory access using Xen grant mappings xen/grant-dma-ops: Add option to restrict memory access under Xen xen/grants: support allocating consecutive grants arm/xen: Introduce xen_setup_dma_ops() virtio: replace arch_has_restricted_virtio_memory_access() kernel: add platform_has() infrastructure
2022-06-10Merge tag 'wireless-next-2022-06-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== wireless-next patches for v5.20 Here's a first set of patches for v5.20. This is just a queue flush, before we get things back from net-next that are causing conflicts, and then can start merging a lot of MLO (multi-link operation, part of 802.11be) code. Lots of cleanups all over. The only notable change is perhaps wilc1000 being the first driver to disable WEP (while enabling WPA3). * tag 'wireless-next-2022-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (29 commits) wifi: mac80211_hwsim: Directly use ida_alloc()/free() wifi: mac80211: refactor some key code wifi: mac80211: remove cipher scheme support wifi: nl80211: fix typo in comment wifi: virt_wifi: fix typo in comment rtw89: add new state to CFO state machine for UL-OFDMA rtw89: 8852c: add trigger frame counter ieee80211: add trigger frame definition wifi: wfx: Remove redundant NULL check before release_firmware() call wifi: rtw89: support MULTI_BSSID and correct BSSID mask of H2C wifi: ray_cs: Drop useless status variable in parse_addr() wifi: ray_cs: Utilize strnlen() in parse_addr() wifi: rtw88: use %*ph to print small buffer wifi: wilc1000: add IGTK support wifi: wilc1000: add WPA3 SAE support wifi: wilc1000: remove WEP security support wifi: wilc1000: use correct sequence of RESET for chip Power-UP/Down wifi: rtlwifi: fix error codes in rtl_debugfs_set_write_h2c() wifi: rtw88: Fix Sparse warning for rtw8821c_hw_spec wifi: rtw88: Fix Sparse warning for rtw8723d_hw_spec ... ==================== Link: https://lore.kernel.org/r/20220610142838.330862-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10wifi: mac80211: remove cipher scheme supportJohannes Berg
The only driver using this was iwlwifi, where we just removed the support because it was never really used. Remove the code from mac80211 as well. Change-Id: I1667417a5932315ee9d81f5c233c56a354923f09 Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10wifi: nl80211: fix typo in commentJulia Lawall
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220521111145.81697-77-Julia.Lawall@inria.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10ieee80211: add trigger frame definitionPo Hao Huang
Define trigger stype of control frame, and its checking function, struct and trigger type within common_info of trigger. Signed-off-by: Po Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220608113224.11193-2-pkshih@realtek.com
2022-06-09bonding: netlink error message support for optionsJonathan Toppins
Add support for reporting errors via extack in both bond_newlink and bond_changelink. Instead of having to look in the kernel log for why an option was not correct just report the error to the user via the extack variable. What is currently reported today: ip link add bond0 type bond ip link set bond0 up ip link set bond0 type bond mode 4 RTNETLINK answers: Device or resource busy After this change: ip link add bond0 type bond ip link set bond0 up ip link set bond0 type bond mode 4 Error: unable to set option because the bond is up. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09team: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: adopt u64_stats_t in struct pcpu_sw_netstatsEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09vlan: adopt u64_stats_tEric Dumazet
As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Add READ_ONCE() when reading rx_errors & tx_dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09net: rename reference+tracking helpersJakub Kicinski
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TXMaxim Mikityanskiy
To embrace possible future optimizations of TLS, rename zerocopy sendfile definitions to more generic ones: * setsockopt: TLS_TX_ZEROCOPY_SENDFILE- > TLS_TX_ZEROCOPY_RO * sock_diag: TLS_INFO_ZC_SENDFILE -> TLS_INFO_ZC_RO_TX RO stands for readonly and emphasizes that the application shouldn't modify the data being transmitted with zerocopy to avoid potential disconnection. Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_contextDavid Howells
While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09Merge tag 'net-5.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-08ipv6: Fix signed integer overflow in __ip6_append_dataWang Yufen
Resurrect ubsan overflow checks and ubsan report this warning, fix it by change the variable [length] type to size_t. UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19 2147479552 + 8567 cannot be represented in type 'int' CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x214/0x230 show_stack+0x30/0x78 dump_stack_lvl+0xf8/0x118 dump_stack+0x18/0x30 ubsan_epilogue+0x18/0x60 handle_overflow+0xd0/0xf0 __ubsan_handle_add_overflow+0x34/0x44 __ip6_append_data.isra.48+0x1598/0x1688 ip6_append_data+0x128/0x260 udpv6_sendmsg+0x680/0xdd0 inet6_sendmsg+0x54/0x90 sock_sendmsg+0x70/0x88 ____sys_sendmsg+0xe8/0x368 ___sys_sendmsg+0x98/0xe0 __sys_sendmmsg+0xf4/0x3b8 __arm64_sys_sendmmsg+0x34/0x48 invoke_syscall+0x64/0x160 el0_svc_common.constprop.4+0x124/0x300 do_el0_svc+0x44/0xc8 el0_svc+0x3c/0x1e8 el0t_64_sync_handler+0x88/0xb0 el0t_64_sync+0x16c/0x170 Changes since v1: -Change the variable [length] type to unsigned, as Eric Dumazet suggested. Changes since v2: -Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested. Changes since v3: -Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as Jakub Kicinski suggested. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: constify some inline functions in sock.hPeter Lafreniere
Despite these inline functions having full visibility to the compiler at compile time, they still strip const from passed pointers. This change allows for functions in various network drivers to be marked as const that could not be marked const before. Signed-off-by: Peter Lafreniere <pjlafren@mtu.edu> Link: https://lore.kernel.org/r/20220606113458.35953-1-pjlafren@mtu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix NAT support for NFPROTO_INET without layer 3 address, from Florian Westphal. 2) Use kfree_rcu(ptr, rcu) variant in nf_tables clean_net path. 3) Use list to collect flowtable hooks to be deleted. 4) Initialize list of hook field in flowtable transaction. 5) Release hooks on error for flowtable updates. 6) Memleak in hardware offload rule commit and abort paths. 7) Early bail out in case device does not support for hardware offload. This adds a new interface to net/core/flow_offload.c to check if the flow indirect block list is empty. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: bail out early if hardware offload is not supported netfilter: nf_tables: memleak flow rule from commit path netfilter: nf_tables: release new hooks on unsupported flowtable flags netfilter: nf_tables: always initialize flowtable hook list in transaction netfilter: nf_tables: delete flowtable hooks via transaction list netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path netfilter: nat: really support inet nat without l3 address ==================== Link: https://lore.kernel.org/r/20220606212055.98300-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-07net: dropreason: reformat the comment fo skb drop reasonsMenglong Dong
To make the code clear, reformat the comment in dropreason.h to k-doc style. Now, the comment can pass the check of kernel-doc without warnning: $ ./scripts/kernel-doc -v -none include/linux/dropreason.h include/linux/dropreason.h:7: info: Scanning doc for enum skb_drop_reason Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-07net: skb: use auto-generation to convert skb drop reason to stringMenglong Dong
It is annoying to add new skb drop reasons to 'enum skb_drop_reason' and TRACE_SKB_DROP_REASON in trace/event/skb.h, and it's easy to forget to add the new reasons we added to TRACE_SKB_DROP_REASON. TRACE_SKB_DROP_REASON is used to convert drop reason of type number to string. For now, the string we passed to user space is exactly the same as the name in 'enum skb_drop_reason' with a 'SKB_DROP_REASON_' prefix. Therefore, we can use 'auto-generation' to generate these drop reasons to string at build time. The new source 'dropreason_str.c' will be auto generated during build time, which contains the string array 'const char * const drop_reasons[]'. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-07net: skb: move enum skb_drop_reason to standalone header fileMenglong Dong
As the skb drop reasons are getting more and more, move the enum 'skb_drop_reason' and related function to the standalone header 'dropreason.h', as Jakub Kicinski suggested. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-06netfilter: nf_tables: bail out early if hardware offload is not supportedPablo Neira Ayuso
If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device provides the .ndo_setup_tc interface or there is an indirect flow block that has been registered. Otherwise, bail out early from the preparation phase. Moreover, validate that family == NFPROTO_NETDEV and hook is NF_NETDEV_INGRESS. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-06arm/xen: Assign xen-grant DMA ops for xen-grant DMA devicesOleksandr Tyshchenko
By assigning xen-grant DMA ops we will restrict memory access for passed device using Xen grant mappings. This is needed for using any virtualized device (e.g. virtio) in Xen guests in a safe manner. Please note, for the virtio devices the XEN_VIRTIO config should be enabled (it forces ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS). Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1654197833-25362-9-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devicesOleksandr Tyshchenko
Use the presence of "iommus" property pointed to the IOMMU node with recently introduced "xen,grant-dma" compatible as a clear indicator of enabling Xen grant mappings scheme for that device and read the ID of Xen domain where the corresponding backend is running. The domid (domain ID) is used as an argument to the Xen grant mapping APIs. To avoid the deferred probe timeout which takes place after reusing generic IOMMU device tree bindings (because the IOMMU device never becomes available) enable recently introduced stub IOMMU driver by selecting XEN_GRANT_DMA_IOMMU. Also introduce xen_is_grant_dma_device() to check whether xen-grant DMA ops need to be set for a passed device. Remove the hardcoded domid 0 in xen_grant_setup_dma_ops(). Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1654197833-25362-8-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06xen/virtio: Enable restricted memory access using Xen grant mappingsJuergen Gross
In order to support virtio in Xen guests add a config option XEN_VIRTIO enabling the user to specify whether in all Xen guests virtio should be able to access memory via Xen grant mappings only on the host side. Also set PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS feature from the guest initialization code on Arm and x86 if CONFIG_XEN_VIRTIO is enabled. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/1654197833-25362-5-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06xen/grant-dma-ops: Add option to restrict memory access under XenJuergen Gross
Introduce Xen grant DMA-mapping layer which contains special DMA-mapping routines for providing grant references as DMA addresses to be used by frontends (e.g. virtio) in Xen guests. Add the needed functionality by providing a special set of DMA ops handling the needed grant operations for the I/O pages. The subsequent commit will introduce the use case for xen-grant DMA ops layer to enable using virtio devices in Xen guests in a safe manner. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/1654197833-25362-4-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06xen/grants: support allocating consecutive grantsJuergen Gross
For support of virtio via grant mappings in rare cases larger mappings using consecutive grants are needed. Support those by adding a bitmap of free grants. As consecutive grants will be needed only in very rare cases (e.g. when configuring a virtio device with a multi-page ring), optimize for the normal case of non-consecutive allocations. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/1654197833-25362-3-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06arm/xen: Introduce xen_setup_dma_ops()Oleksandr Tyshchenko
This patch introduces new helper and places it in new header. The helper's purpose is to assign any Xen specific DMA ops in a single place. For now, we deal with xen-swiotlb DMA ops only. The one of the subsequent commits in current series will add xen-grant DMA ops case. Also re-use the xen_swiotlb_detect() check on Arm32. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> [For arm64] Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/1654197833-25362-2-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06virtio: replace arch_has_restricted_virtio_memory_access()Juergen Gross
Instead of using arch_has_restricted_virtio_memory_access() together with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those with platform_has() and a new platform feature PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov <bp@suse.de>
2022-06-06kernel: add platform_has() infrastructureJuergen Gross
Add a simple infrastructure for setting, resetting and querying platform feature flags. Flags can be either global or architecture specific. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-06ata: libata: drop 'sas_last_tag'Hannes Reinecke
Unused now. Fixes: 4f1a22ee7b57 ("libata: Improve ATA queued command allocation") Cc: John Garry <john.garry@huawei.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>