summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-05-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-14 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON. 2) support for narrow loads in bpf_sock_addr progs and additional helpers in cg-skb progs, from Andrey. 3) bpf benchmark runner, from Andrii. 4) arm and riscv JIT optimizations, from Luke. 5) bpf iterator infrastructure, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14net: qed: fix bad formattingIgor Russkikh
On some adjacent code, fix bad code formatting Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14net: qed: attention clearing propertiesIgor Russkikh
On different hardware events we have to respond differently, on some of hardware indications hw attention (error condition) should be cleared by the driver to continue normal functioning. Here we introduce attention clear flags, and put them on some important events (in aeu_descs). Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-14net: qed: adding hw_err states and handlingIgor Russkikh
Here we introduce qed device error tracking flags and error types. qed_hw_err_notify is an entrace point to report errors. It'll notify higher level drivers (qede/qedr/etc) to handle and recover the error. List of posible errors comes from hardware interfaces, but could be extended in future. Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-13net: phy: broadcom: add cable test supportMichael Walle
Most modern broadcom PHYs support ECD (enhanced cable diagnostics). Add support for it in the bcm-phy-lib so they can easily be used in the PHY driver. There are two access methods for ECD: legacy by expansion registers and via the new RDB registers which are exclusive. Provide functions in two variants where the PHY driver can choose from. To keep things simple for now, we just switch the register access to expansion registers in the RDB variant for now. On the flipside, we have to keep a bus lock to prevent any other non-legacy access on the PHY. The results of the intra-pair tests are inconclusive (at least for the BCM54140). Most of the times half the length is reported but sometimes the length is correct. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-13bpf: Enable bpf_iter targets registering ctx argument typesYonghong Song
Commit b121b341e598 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support") adds a field btf_id_or_null_non0_off to bpf_prog->aux structure to indicate that the first ctx argument is PTR_TO_BTF_ID reg_type and all others are PTR_TO_BTF_ID_OR_NULL. This approach does not really scale if we have other different reg types in the future, e.g., a pointer to a buffer. This patch enables bpf_iter targets registering ctx argument reg types which may be different from the default one. For example, for pointers to structures, the default reg_type is PTR_TO_BTF_ID for tracing program. The target can register a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can be used by the verifier to enforce accesses. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com
2020-05-13bpf: Change func bpf_iter_unreg_target() signatureYonghong Song
Change func bpf_iter_unreg_target() parameter from target name to target reg_info, similar to bpf_iter_reg_target(). Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180220.2949737-1-yhs@fb.com
2020-05-13bpf: net: Refactor bpf_iter target registrationYonghong Song
Currently bpf_iter_reg_target takes parameters from target and allocates memory to save them. This is really not necessary, esp. in the future we may grow information passed from targets to bpf_iter manager. The patch refactors the code so target reg_info becomes static and bpf_iter manager can just take a reference to it. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180219.2949605-1-yhs@fb.com
2020-05-13bpf: Change btf_iter func proto prefix to "bpf_iter_"Yonghong Song
This is to be consistent with tracing and lsm programs which have prefix "bpf_trace_" and "bpf_lsm_" respectively. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180216.2949387-1-yhs@fb.com
2020-05-12net: dsa: tag_sja1105: implement sub-VLAN decodingVladimir Oltean
Create a subvlan_map as part of each port's tagger private structure. This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules. Note that as of this patch, this piece of code is never engaged, due to the fact that the driver hasn't installed any retagging rule, so we'll always see packets with a subvlan code of 0 (untagged). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANsVladimir Oltean
For switches that support VLAN retagging, such as sja1105, we extend dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the dsa_8021q tag. A sub-VLAN is nothing more than a number in the range 0-7, which serves as an index into a per-port driver lookup table. The sub-VLAN value of zero means that traffic is untagged (this is also backwards-compatible with dsa_8021q without retagging). The switch should be configured to retag VLAN-tagged traffic that gets transmitted towards the CPU port (and towards the CPU only). Example: bridge vlan add dev sw1p0 vid 100 The switch retags frames received on port 0, going to the CPU, and having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language: | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | +-----------+-----+-----------------+-----------+-----------------------+ | DIR | SVL | SWITCH_ID | SUBVLAN | PORT | +-----------+-----+-----------------+-----------+-----------------------+ 0x0450 means: - DIR = 0b01: this is an RX VLAN - SUBVLAN = 0b001: this is subvlan #1 - SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0") - PORT = 0b0000: this is port 0 (see the name "sw1p0") The driver also remembers the "1 -> 100" mapping. In the hotpath, if the sub-VLAN from the tag encodes a non-untagged frame, this mapping is used to create a VLAN hwaccel tag, with the value of 100. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneouslyVladimir Oltean
In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of 0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering mode, it needs to work with normal VLAN TPID values. A complication arises when we must transmit a VLAN-tagged packet to the switch when it's in VLAN-aware mode. We need to construct a packet with 2 VLAN tags, and the switch will use the outer header for routing and pop it on egress. But sadly, here the 2 hardware generations don't behave the same: - E/T switches won't pop an ETH_P_8021AD tag on egress, it seems (packets will remain double-tagged). - P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks like it tries to prevent VLAN hopping). But looks like the reverse is also true: - E/T switches have no problem popping the outer tag from packets with 2 ETH_P_8021Q tags. - P/Q/R/S will have no problem popping a single tag even if that is ETH_P_8021AD. So it is clear that if we want the hardware to work with dsa_8021q tagging in VLAN-aware mode, we need to send different TPIDs depending on revision. Keep that information in priv->info->qinq_tpid. The per-port tagger structure will hold an xmit_tpid value that depends not only upon the qinq_tpid, but also upon the VLAN awareness state itself (in case we must transmit using 0xdadb). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12net: dsa: sja1105: save/restore VLANs using a delta commit methodVladimir Oltean
Managing the VLAN table that is present in hardware will become very difficult once we add a third operating state (best_effort_vlan_filtering). That is because correct cleanup (not too little, not too much) becomes virtually impossible, when VLANs can be added from the bridge layer, from dsa_8021q for basic tagging, for cross-chip bridging, as well as retagging rules for sub-VLANs and cross-chip sub-VLANs. So we need to rethink VLAN interaction with the switch in a more scalable way. In preparation for that, use the priv->expect_dsa_8021q boolean to classify any VLAN request received through .port_vlan_add or .port_vlan_del towards either one of 2 internal lists: bridge VLANs and dsa_8021q VLANs. Then, implement a central sja1105_build_vlan_table method that creates a VLAN configuration from scratch based on the 2 lists of VLANs kept by the driver, and based on the VLAN awareness state. Currently, if we are VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs. Then, implement a delta commit procedure that identifies which VLANs from this new configuration are actually different from the config previously committed to hardware. We apply the delta through the dynamic configuration interface (we don't reset the switch). The result is that the hardware should see the exact sequence of operations as before this patch. This also helps remove the "br" argument passed to dsa_8021q_crosschip_bridge_join, which it was only using to figure out whether it should commit the configuration back to us or not, based on the VLAN awareness state of the bridge. We can simplify that, by always allowing those VLANs inside of our dsa_8021q_vlans list, and committing those to hardware when necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helperVladimir Oltean
This function returns a boolean denoting whether the VLAN passed as argument is part of the 1024-3071 range that the dsa_8021q tagging scheme uses. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-11net: cleanly handle kernel vs user buffers for ->msg_controlChristoph Hellwig
The msg_control field in struct msghdr can either contain a user pointer when used with the recvmsg system call, or a kernel pointer when used with sendmsg. To complicate things further kernel_recvmsg can stuff a kernel pointer in and then use set_fs to make the uaccess helpers accept it. Replace it with a union of a kernel pointer msg_control field, and a user pointer msg_control_user one, and allow kernel_recvmsg operate on a proper kernel pointer using a bitfield to override the normal choice of a user pointer for recvmsg. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-11net: add a CMSG_USER_DATA macroChristoph Hellwig
Add a variant of CMSG_DATA that operates on user pointer to avoid sparse warnings about casting to/from user pointers. Also fix up CMSG_DATA to rely on the gcc extension that allows void pointer arithmetics to cut down on the amount of casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-11team: Replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-10net: dsa: sja1105: implement cross-chip bridging operationsVladimir Oltean
sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart and which is compatible with cascading. A complete description of this tagging format is in net/dsa/tag_8021q.c, but a quick summary is that each external-facing port tags incoming frames with a unique pvid, and this special VLAN is transmitted as tagged towards the inside of the system, and as untagged towards the exterior. The tag encodes the switch id and the source port index. This means that cross-chip bridging for dsa_8021q only entails adding the dsa_8021q pvids of one switch to the RX filter of the other switches. Everything else falls naturally into place, as long as the bottom-end of ports (the leaves in the tree) is comprised exclusively of dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be a chance that a front-panel switch transmits a packet tagged with a dsa_8021q header, header which it wouldn't be able to remove, and which would hence "leak" out. The only use case I tested (due to lack of board availability) was when the sja1105 switches are part of disjoint trees (however, this doesn't change the fact that multiple sja1105 switches still need unique switch identifiers in such a system). But in principle, even "true" single-tree setups (with DSA links) should work just as fine, except for a small change which I can't test: dsa_towards_port should be used instead of dsa_upstream_port (I made the assumption that the routing port that any sja1105 should use towards its neighbours is the CPU port. That might not hold true in other setups). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: ethtool: Add helpers for reporting test resultsAndrew Lunn
The PHY drivers can use these helpers for reporting the results. The results get translated into netlink attributes which are added to the pre-allocated skbuf. v3: Poison phydev->skb Return -EMSGSIZE when ethnl_bcastmsg_put() fails Return valid error code when nla_nest_start() fails Use u8 for results Actually put u32 length into message v4: s/ENOTSUPP/EOPNOTSUPP/g Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: ethtool: Add infrastructure for reporting cable test resultsAndrew Lunn
Provide infrastructure for PHY drivers to report the cable test results. A netlink skb is associated to the phydev. Helpers will be added which can add results to this skb. Once the test has finished the results are sent to user space. When netlink ethtool is not part of the kernel configuration stubs are provided. It is also impossible to trigger a cable test, so the error code returned by the alloc function is of no consequence. v2: Include the status complete in the netlink notification message v4: Replace -EINVAL with -EMSGSIZE Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: phy: Add support for polling cable testAndrew Lunn
Some PHYs are not capable of generating interrupts when a cable test finished. They do however support interrupts for normal operations, like link up/down. As such, the PHY state machine would normally not poll the PHY. Add support for indicating the PHY state machine must poll the PHY when performing a cable test. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: phy: Add cable test support to state machineAndrew Lunn
Running a cable test is desruptive to normal operation of the PHY and can take a 5 to 10 seconds to complete. The RTNL lock cannot be held for this amount of time, and add a new state to the state machine for running a cable test. The driver is expected to implement two functions. The first is used to start a cable test. Once the test has started, it should return. The second function is called once per second, or on interrupt to check if the cable test is complete, and to allow the PHY to report the status. v2: Rename phy_cable_test_abort to phy_abort_cable_test Return different extack when already running test Use phy_init_hw() to reset the PHY Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09IB/mlx4: Replace zero-length array with flexible-arrayGustavo A. R. Silva
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09bpf: Add PTR_TO_BTF_ID_OR_NULL supportYonghong Song
Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support. For tracing/iter program, the bpf program context definition, e.g., for previous bpf_map target, looks like struct bpf_iter__bpf_map { struct bpf_iter_meta *meta; struct bpf_map *map; }; The kernel guarantees that meta is not NULL, but map pointer maybe NULL. The NULL map indicates that all objects have been traversed, so bpf program can take proper action, e.g., do final aggregation and/or send final report to user space. Add btf_id_or_null_non0_off to prog->aux structure, to indicate that if the context access offset is not 0, set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID. This bit is set for tracing/iter program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com
2020-05-09net: bpf: Add netlink and ipv6_route bpf_iter targetsYonghong Song
This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
2020-05-09bpf: Add bpf_map iteratorYonghong Song
Implement seq_file operations to traverse all bpf_maps. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175909.2476096-1-yhs@fb.com
2020-05-09bpf: Implement common macros/helpers for target iteratorsYonghong Song
Macro DEFINE_BPF_ITER_FUNC is implemented so target can define an init function to capture the BTF type which represents the target. The bpf_iter_meta is a structure holding meta data, common to all targets in the bpf program. Additional marker functions are called before or after bpf_seq_read() show()/next()/stop() callback functions to help calculate precise seq_num and whether call bpf_prog inside stop(). Two functions, bpf_iter_get_info() and bpf_iter_run_prog(), are implemented so target can get needed information from bpf_iter infrastructure and can run the program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175907.2475956-1-yhs@fb.com
2020-05-09bpf: Create file bpf iteratorYonghong Song
To produce a file bpf iterator, the fd must be corresponding to a link_fd assocciated with a trace/iter program. When the pinned file is opened, a seq_file will be generated. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175906.2475893-1-yhs@fb.com
2020-05-09bpf: Create anonymous bpf iteratorYonghong Song
A new bpf command BPF_ITER_CREATE is added. The anonymous bpf iterator is seq_file based. The seq_file private data are referenced by targets. The bpf_iter infrastructure allocated additional space at seq_file->private before the space used by targets to store some meta data, e.g., prog: prog to run session_id: an unique id for each opened seq_file seq_num: how many times bpf programs are queried in this session done_stop: an internal state to decide whether bpf program should be called in seq_ops->stop() or not The seq_num will start from 0 for valid objects. The bpf program may see the same seq_num more than once if - seq_file buffer overflow happens and the same object is retried by bpf_seq_read(), or - the bpf program explicitly requests a retry of the same object Since module is not supported for bpf_iter, all target registeration happens at __init time, so there is no need to change bpf_iter_unreg_target() as it is used mostly in error path of the init function at which time no bpf iterators have been created yet. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
2020-05-09bpf: Support bpf tracing/iter programs for BPF_LINK_CREATEYonghong Song
Given a bpf program, the step to create an anonymous bpf iterator is: - create a bpf_iter_link, which combines bpf program and the target. In the future, there could be more information recorded in the link. A link_fd will be returned to the user space. - create an anonymous bpf iterator with the given link_fd. The bpf_iter_link can be pinned to bpffs mount file system to create a file based bpf iterator as well. The benefit to use of bpf_iter_link: - using bpf link simplifies design and implementation as bpf link is used for other tracing bpf programs. - for file based bpf iterator, bpf_iter_link provides a standard way to replace underlying bpf programs. - for both anonymous and free based iterators, bpf link query capability can be leveraged. The patch added support of tracing/iter programs for BPF_LINK_CREATE. A new link type BPF_LINK_TYPE_ITER is added to facilitate link querying. Currently, only prog_id is needed, so there is no additional in-kernel show_fdinfo() and fill_link_info() hook is needed for BPF_LINK_TYPE_ITER link. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
2020-05-09bpf: Allow loading of a bpf_iter programYonghong Song
A bpf_iter program is a tracing program with attach type BPF_TRACE_ITER. The load attribute attach_btf_id is used by the verifier against a particular kernel function, which represents a target, e.g., __bpf_iter__bpf_map for target bpf_map which is implemented later. The program return value must be 0 or 1 for now. 0 : successful, except potential seq_file buffer overflow which is handled by seq_file reader. 1 : request to restart the same object In the future, other return values may be used for filtering or teminating the iterator. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
2020-05-09bpf: Implement an interface to register bpf_iter targetsYonghong Song
The target can call bpf_iter_reg_target() to register itself. The needed information: target: target name seq_ops: the seq_file operations for the target init_seq_private target callback to initialize seq_priv during file open fini_seq_private target callback to clean up seq_priv during file release seq_priv_size: the private_data size needed by the seq_file operations The target name represents a target which provides a seq_ops for iterating objects. The target can provide two callback functions, init_seq_private and fini_seq_private, called during file open/release time. For example, /proc/net/{tcp6, ipv6_route, netlink, ...}, net name space needs to be setup properly during file open and released properly during file release. Function bpf_iter_unreg_target() is also implemented to unregister a particular target. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175859.2474669-1-yhs@fb.com
2020-05-09Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge includes updates to bonding driver needed for the rdma stack, to avoid conflicts with the RDMA branch. Maor Gottlieb Says: ==================== Bonding: Add support to get xmit slave The following series adds support to get the LAG master xmit slave by introducing new .ndo - ndo_get_xmit_slave. Every LAG module can implement it and it first implemented in the bond driver. This is follow-up to the RFC discussion [1]. The main motivation for doing this is for drivers that offload part of the LAG functionality. For example, Mellanox Connect-X hardware implements RoCE LAG which selects the TX affinity when the resources are created and port is remapped when it goes down. The first part of this patchset introduces the new .ndo and add the support to the bonding module. The second part adds support to get the RoCE LAG xmit slave by building skb of the RoCE packet based on the AH attributes and call to the new .ndo. The third part change the mlx5 driver driver to set the QP's affinity port according to the slave which found by the .ndo. ==================== Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-07netpoll: accept NULL np argument in netpoll_send_skb()Eric Dumazet
netpoll_send_skb() callers seem to leak skb if the np pointer is NULL. While this should not happen, we can make the code more robust. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: netpoll_send_skb() returns transmit statusEric Dumazet
Some callers want to know if the packet has been sent or dropped, to inform upper stacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: move netpoll_send_skb() out of lineEric Dumazet
There is no need to inline this helper, as we intend to add more code in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: remove dev argument from netpoll_send_skb_on_dev()Eric Dumazet
netpoll_send_skb_on_dev() can get the device pointer directly from np->dev Rename it to __netpoll_send_skb() Following patch will move netpoll_send_skb() out-of-line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07Merge tag 'wireless-drivers-next-2020-05-07' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.8 First set of patches for v5.8. Changes all over, ath10k apparently seeing most new features this time. rtw88 also had lots of changes due to preparation for new hardware support. In this pull request there's also a new macro to include/linux/iopoll: read_poll_timeout_atomic(). This is needed by rtw88 for atomic polling. Major changes: ath11k * add debugfs file for testing ADDBA and DELBA * add 802.11 encapsulation offload on hardware support * add htt_peer_stats_reset debugfs file ath10k * enable VHT160 and VHT80+80 modes * enable radar detection in secondary segment * sdio: disable TX complete indication to improve throughput * sdio: decrease power consumption * sdio: add HTT TX bundle support to increase throughput * sdio: add rx bitrate reporting ath9k * improvements to AR9002 calibration logic carl9170 * remove buggy P2P_GO support p54usb * add support for AirVasT USB stick rtw88 * add support for antenna configuration ti wlcore * add support for AES_CMAC cipher iwlwifi * support for a few new FW API versions * new hw configs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix reference count leaks in various parts of batman-adv, from Xiyu Yang. 2) Update NAT checksum even when it is zero, from Guillaume Nault. 3) sk_psock reference count leak in tls code, also from Xiyu Yang. 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in fq_codel, from Eric Dumazet. 5) Fix panic in choke_reset(), also from Eric Dumazet. 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan. 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet. 8) Fix crash in x25_disconnect(), from Yue Haibing. 9) Don't pass pointer to local variable back to the caller in nf_osf_hdr_ctx_init(), from Arnd Bergmann. 10) Wireguard should use the ECN decap helper functions, from Toke Høiland-Jørgensen. 11) Fix command entry leak in mlx5 driver, from Moshe Shemesh. 12) Fix uninitialized variable access in mptcp's subflow_syn_recv_sock(), from Paolo Abeni. 13) Fix unnecessary out-of-order ingress frame ordering in macsec, from Scott Dial. 14) IPv6 needs to use a global serial number for dst validation just like ipv4, from David Ahern. 15) Fix up PTP_1588_CLOCK deps, from Clay McClure. 16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from Yoshiyuki Kurauchi. 17) Fix a regression in that dsa user port errors should not be fatal, from Florian Fainelli. 18) Fix iomap leak in enetc driver, from Dejin Zheng. 19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang. 20) Initialize protocol value earlier in neigh code paths when generating events, from Roman Mashak. 21) netdev_update_features() must be called with RTNL mutex in macsec driver, from Antoine Tenart. 22) Validate untrusted GSO packets even more strictly, from Willem de Bruijn. 23) Wireguard decrypt worker needs a cond_resched(), from Jason Donenfeld. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning wireguard: send/receive: cond_resched() when processing worker ringbuffers wireguard: socket: remove errant restriction on looping to self wireguard: selftests: use normal kernel stack size on ppc64 net: ethernet: ti: am65-cpsw-nuss: fix irqs type ionic: Use debugfs_create_bool() to export bool net: dsa: Do not leave DSA master with NULL netdev_ops net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred net: stricter validation of untrusted gso packets seg6: fix SRH processing to comply with RFC8754 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms net: dsa: ocelot: the MAC table on Felix is twice as large net: dsa: sja1105: the PTP_CLK extts input reacts on both edges selftests: net: tcp_mmap: fix SO_RCVLOWAT setting net: hsr: fix incorrect type usage for protocol variable net: macsec: fix rtnl locking issue net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() ...
2020-05-06ethtool: provide UAPI for PHY master/slave configuration.Oleksij Rempel
This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of auto-negotiation support, we needed to be able to configure the MASTER-SLAVE role of the port manually or from an application in user space. The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to force MASTER or SLAVE role. See IEEE 802.3-2018: 22.2.4.3.7 MASTER-SLAVE control register (Register 9) 22.2.4.3.8 MASTER-SLAVE status register (Register 10) 40.5.2 MASTER-SLAVE configuration resolution 45.2.1.185.1 MASTER-SLAVE config value (1.2100.14) 45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32) The MASTER-SLAVE role affects the clock configuration: ------------------------------------------------------------------------------- When the PHY is configured as MASTER, the PMA Transmit function shall source TX_TCLK from a local clock source. When configured as SLAVE, the PMA Transmit function shall source TX_TCLK from the clock recovered from data stream provided by MASTER. iMX6Q KSZ9031 XXX ------\ /-----------\ /------------\ | | | | | MAC |<----RGMII----->| PHY Slave |<------>| PHY Master | |<--- 125 MHz ---+-<------/ | | \ | ------/ \-----------/ \------------/ ^ \-TX_TCLK ------------------------------------------------------------------------------- Since some clock or link related issues are only reproducible in a specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial to provide generic (not 100BASE-T1 specific) interface to the user space for configuration flexibility and trouble shooting. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: stricter validation of untrusted gso packetsWillem de Bruijn
Syzkaller again found a path to a kernel crash through bad gso input: a packet with transport header extending beyond skb_headlen(skb). Tighten validation at kernel entry: - Verify that the transport header lies within the linear section. To avoid pulling linux/tcp.h, verify just sizeof tcphdr. tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use. - Match the gso_type against the ip_proto found by the flow dissector. Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge tag 'perf-for-bpf-2020-05-06' of ↵Alexei Starovoitov
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next CAP_PERFMON for BPF
2020-05-06timer: add fsleep for flexible sleepingHeiner Kallweit
Sleeping for a certain amount of time requires use of different functions, depending on the time period. Documentation/timers/timers-howto.rst explains when to use which function, and also checkpatch checks for some potentially problematic cases. So let's create a helper that automatically chooses the appropriate sleep function -> fsleep(), for flexible sleeping If the delay is a constant, then the compiler should be able to ensure that the new helper doesn't create overhead. If the delay is not constant, then the new helper can save some code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform fix from Benson Leung: "Fix a resource allocation issue in cros_ec_sensorhub.c" * tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors
2020-05-06net: phy: add concept of shared storage for PHYsMichael Walle
There are packages which contain multiple PHY devices, eg. a quad PHY transceiver. Provide functions to allocate and free shared storage. Usually, a quad PHY contains global registers, which don't belong to any PHY. Provide convenience functions to access these registers. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06iopoll: Introduce read_poll_timeout_atomic macroKai-Heng Feng
Like read_poll_timeout, an atomic variant for multiple parameter read function can be useful. Will be used by a later patch. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200424184918.30360-1-kai.heng.feng@canonical.com
2020-05-04bpf: Avoid gcc-10 stringop-overflow warning in struct bpf_progArnd Bergmann
gcc-10 warns about accesses to zero-length arrays: kernel/bpf/core.c: In function 'bpf_patch_insn_single': cc1: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=] In file included from kernel/bpf/core.c:21: include/linux/filter.h:550:20: note: at offset 0 to object 'insnsi' with size 0 declared here 550 | struct bpf_insn insnsi[0]; | ^~~~~~ In this case, we really want to have two flexible-array members, but that is not possible. Removing the union to make insnsi a flexible-array member while leaving insns as a zero-length array fixes the warning, as nothing writes to the other one in that way. This trick only works on linux-3.18 or higher, as older versions had additional members in the union. Fixes: 60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430213101.135134-6-arnd@arndb.de
2020-05-04net: partially revert dynamic lockdep key changesCong Wang
This patch reverts the folowing commits: commit 064ff66e2bef84f1153087612032b5b9eab005bd "bonding: add missing netdev_update_lockdep_key()" commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()" commit 1f26c0d3d24125992ab0026b0dab16c08df947c7 "net: fix kernel-doc warning in <linux/netdevice.h>" commit ab92d68fc22f9afab480153bd82a20f6e2533769 "net: core: add generic lockdep keys" but keeps the addr_list_lock_key because we still lock addr_list_lock nestedly on stack devices, unlikely xmit_lock this is safe because we don't take addr_list_lock on any fast path. Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: add helper eth_hw_addr_crcHeiner Kallweit
Several drivers use the same code as basis for filter hashes. Therefore let's factor it out to a helper. This way drivers don't have to access struct netdev_hw_addr internals. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>