summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-03-14ieee802154: change wpan-phy name to phyAlexander Aring
Currently the wpan_phy under /sys/class/ieee802154/ is named as "wpan-phy#", this patch will change the name to phy. This will introduce the same naming convention like wireless. Note: wpan-tools users will not type "wpan-phy#" anymore, just a simple "phy#" is enough. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-14ieee802154: 6lowpan: fix ARPHRD to ARPHRD_6LOWPANAlexander Aring
Currently there exists two interface types with ARPHRD_IEEE802154. These are the 802.15.4 interfaces and 802.15.4 6LoWPAN interfaces. This is more a bug because some userspace applications checks on this value like wireshark. This occurs that wireshark will always try to parse a lowpan interface as 802.15.4 frames. With ARPHRD_6LOWPAN wireshark will parse it as IPv6 frames which is correct. Much applications checks on this value to readout the EUI64 mac address which should be the same for ARPHRD_6LOWPAN. BTLE 6LoWPAN and ieee802154 6LoWPAN will share now the same ARPHRD. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-13Bluetooth: Merge hdev->dbg_flags fields into hdev->dev_flagsMarcel Holtmann
With the extension of hdev->dev_flags utilizing a bitmap now, the space is no longer restricted. Merge the hdev->dbg_flags into hdev->dev_flags to save space on 64-bit architectures. On 32-bit architectures no size reduction happens. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Use DECLARE_BITMAP for hdev->dev_flags fieldMarcel Holtmann
The hdev->dev_flags field has outgrown itself on 32-bit systems. So instead of hacking around it, switch to using DECLARE_BITMAP. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_test_and_set_flag helper macroMarcel Holtmann
Instead of manually coding test_and_set_bit on hdev->dev_flags all the time, use hci_dev_test_and_set_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_test_and_clear_flag helper macroMarcel Holtmann
Instead of manually coding test_and_clear_bit on hdev->dev_flags all the time, use hci_dev_test_and_clear_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_test_and_change_flag helper macroMarcel Holtmann
Instead of manually coding test_and_change_bit on hdev->dev_flags all the time, use hci_dev_test_and_change_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_change_flag helper macroMarcel Holtmann
Instead of manually coding change_bit on hdev->dev_flags all the time, use hci_dev_change_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_clear_flag helper macroMarcel Holtmann
Instead of manually coding clear_bit on hdev->dev_flags all the time, use hci_dev_clear_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_set_flag helper macroMarcel Holtmann
Instead of manually coding set_bit on hdev->dev_flags all the time, use hci_dev_set_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Introduce hci_dev_test_flag helper macroMarcel Holtmann
Instead of manually coding test_bit on hdev->dev_flags all the time, use hci_dev_test_flag helper macro. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-13Bluetooth: Add support connectable advertising settingMarcel Holtmann
The patch adds a second advertising setting that allows switching of the controller into connectable mode independent of the global connectable setting. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-12Bluetooth: Remove two else branches that are not neededMarcel Holtmann
The SMP code contains two else branches that are not needed since the successful test will actually leave the function. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-11Bluetooth: Check for matching IRK when looking for paired LE devicesJohan Hedberg
If we're given an RPA when checking whether we're paired or not, we should consult the local RPA storage whether there's a matching IRK. This we we ensure that hci_bdaddr_is_paired() gives the right result even when trying to pair a second time with the same device with an RPA. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-11Bluetooth: Fix missing rcu_read_unlock() in hci_bdaddr_is_paired()Johan Hedberg
When finding a matching LTK the rcu_read_unlock() function was failing to release the RCU read lock. This patch adds the missing call to rcu_reaD_unlock(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-11Bluetooth: Increment management interface revisionMarcel Holtmann
This patch increments the management interface revision due to introduction of new static address setting and fixes for the fast connectable feature. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-10Bluetooth: Add 'Already Paired' error for Pair Device commandJohan Hedberg
To make the behavior predictable when attempting to pair with a device for which we already have a Link Key or Long Term Key, this patch adds a new 'Already Paired' error which gets sent in such a scenario. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-10Bluetooth: Make Fast Connectable available while powered offJohan Hedberg
To maximize the usability of the Fast Connectable feature we should make it possible to set (or unset) it at any given moment. This means removing the dependency on the 'connectable' setting as well as the 'powered' setting. The former makes also sense since page scan may get enabled through add_device even if 'connectable' is false. To keep the setting available over power cycles its flag also needs to be removed from the flags that are cleared upon HCI_Reset. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-07Bluetooth: fix sco_exit compile warningAlexander Aring
While compiling the following warning occurs: WARNING: net/built-in.o(.init.text+0x602c): Section mismatch in reference from the function bt_init() to the function .exit.text:sco_exit() The function __init bt_init() references a function __exit sco_exit(). This is often seen when error handling in the init function uses functionality in the exit path. The fix is often to remove the __exit annotation of sco_exit() so it may be used outside an exit section. Since commit 6d785aa345f525e1fdf098b7c590168f0b00f3f1 ("Bluetooth: Convert mgmt to use HCI chan registration API") the function "sco_exit" is used inside of function "bt_init". The suggested solution by remove the __exit annotation solved this issue. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-06Bluetooth: Add mgmt_send_event() helper to send to any HCI channelJohan Hedberg
Currently the mgmt_event() function is only capable of sending to HCI_CHANNEL_CONTROL. To void having to change all users of it, add a new mgmt_send_event() function that takes a channel parameter, and make the old mgmt_event() a wrapper that passes MGMT_CHANNEL_CONTROL to it. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Rename pending_cmd to mgmt_pending_cmdJohan Hedberg
This patch renames the pending_cmd struct (used for tracking pending mgmt commands) to mgmt_pending_cmd, so that it can be moved to a more generic place and be used also by other modules using other HCI channels. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Rename cmd_complete() to mgmt_cmd_complete()Johan Hedberg
This patch renames the cmd_complete() function to mgmt_cmd_complete() in preparation of making it a generic helper for other modules to use too. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Rename cmd_status() to mgmt_cmd_status()Johan Hedberg
This patch renames the cmd_status() function to mgmt_cmd_status() in preparation of making it a generic helper for other modules to use too. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Move all mgmt command quirks to handler tableJohan Hedberg
In order to completely generalize the mgmt command handling we need to move away command-specific information from mgmt_control() into the actual command table. This patch adds a new 'flags' field to the handler entries which can now contain the following command specific information: - Command takes variable length parameters - Command doesn't target any specific HCI device - Command can be sent when the HCI device is unconfigured After this the mgmt_control() function is completely generic and can potentially be reused by new HCI channels. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Convert mgmt to use HCI chan registration APIJohan Hedberg
This patch converts the existing mgmt code to use the newly introduced generic API for registering HCI channels with mgmt-like semantics. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Add mgmt HCI channel registration APIJohan Hedberg
This patch adds an API for registering HCI channels with mgmt-like semantics. For now the only user will be HCI_CHANNEL_CONTROL, but e.g. 6lowpan is intended to use this as well in the future. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-03-06Bluetooth: Introduce controller setting information for static addressMarcel Holtmann
Currently it is not possible to determine if the static address is used by the controller. It is also not possible to determine if using a static on a dual-mode controller with disabled BR/EDR is possible or not. To address this issue, introduce a new setting called static-address. If support for this setting is signaled that means that the kernel supports using static addresses. And if used on dual-mode controllers with BR/EDR disabled it means that a configured static address can be used. In addition utilize the same setting for the list of current active settings that indicates if a static address is configured and if that address will be actually used. With this in mind the existing Set Static Address management command has been extended to return the current settings. That way the caller of that command can easily determine if the programmed address will be used or if extra steps are required. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05Bluetooth: fix service discovery behaviour for empty uuids filterJakub Pawlowski
This patch fixes service discovery behaviour, when provided uuid filter is empty and HCI_QUIRK_STRICT_DUPLICATE_FILTER is set. Before this patch, empty uuid filter was unable to trigger scan restart, and that caused inconsistent behaviour in applications. Example: two DBus clients call BlueZ, one to find all devices with service abcd, second to find all devices with rssi smaller than -90. Sum of those filters, that is passed to mgmt_service_scan is empty filter, with no rssi or uuids set. That caused kernel not to restart scan when quirk was set. That was inconsistent with what happen when there's only one of those two filters set (scan is restarted and reports devices). To fix that, new variable hdev->discovery.result_filtering was introduced. It can indicate that filtered scan is running, no matter what uuid or rssi filter is set. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05Bluetooth: Refactor service discovery filter logicJakub Pawlowski
This patch refactor code responsible for filtering when service discovery method is used. Previously this code was mixed with mgmt_device found logic. Now when it's in one place whole logic can be greatly simplified. That includes removing no longer necessary length field and merging checks for eir and scan_rsp. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-05Bluetooth: Move Service Discovery logic before refactoringJakub Pawlowski
This patch moves whole packet filering logic of service discovery into new function is_filter_match. It's done because logic inside mgmt_device_found is very complicated and needs some simplification. Also having whole logic in one place will allow to simplify it in the future. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-03-02filter: refactor common filter attach code into __sk_attach_progDaniel Borkmann
Both sk_attach_filter() and sk_attach_bpf() are setting up sk_filter, charging skmem and attaching it to the socket after we got the eBPF prog up and ready. Lets refactor that into a common helper. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-03-02 Here's the first bluetooth-next pull request targeting the 4.1 kernel: - ieee802154/6lowpan cleanups - SCO routing to host interface support for the btmrvl driver - AMP code cleanups - Fixes to AMP HCI init sequence - Refactoring of the HCI callback mechanism - Added shutdown routine for Intel controllers in the btusb driver - New config option to enable/disable Bluetooth debugfs information - Fix for early data reception on L2CAP fixed channels Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: Remove iocb argument from sendmsg and recvmsgYing Xue
After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02tipc: Don't use iocb argument in socket layerYing Xue
Currently the iocb argument is used to idenfiy whether or not socket lock is hold before tipc_sendmsg()/tipc_send_stream() is called. But this usage prevents iocb argument from being dropped through sendmsg() at socket common layer. Therefore, in the commit we introduce two new functions called __tipc_sendmsg() and __tipc_send_stream(). When they are invoked, it assumes that their callers have taken socket lock, thereby avoiding the weird usage of iocb argument. Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: move skb->dropcount to skb->cb[]Eyal Birger
Commit 977750076d98 ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark, or any other aliased field to userspace if so desired. Moving the dropcount metric to skb->cb[] eliminates this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: add common accessor for setting dropcount on packetsEyal Birger
As part of an effort to move skb->dropcount to skb->cb[], use a common function in order to set dropcount in struct sk_buff. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: use common macro for assering skb->cb[] available size in protocol familiesEyal Birger
As part of an effort to move skb->dropcount to skb->cb[] use a common macro in protocol families using skb->cb[] for ancillary data to validate available room in skb->cb[]. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: packet: use sockaddr_ll fields as storage for skb original length in ↵Eyal Birger
recvmsg path As part of an effort to move skb->dropcount to skb->cb[], 4 bytes of additional room are needed in skb->cb[] in packet sockets. Store the skb original length in the first two fields of sockaddr_ll (sll_family and sll_protocol) as they can be derived from the skb when needed. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: rxrpc: change call to sock_recv_ts_and_drops() on rxrpc recvmsg to ↵Eyal Birger
sock_recv_timestamp() Commit 3b885787ea4112 ("net: Generalize socket rx gap / receive queue overflow cmsg") allowed receiving packet dropcount information as a socket level option. RXRPC sockets recvmsg function was changed to support this by calling sock_recv_ts_and_drops() instead of sock_recv_timestamp(). However, protocol families wishing to receive dropcount should call sock_queue_rcv_skb() or set the dropcount specifically (as done in packet_rcv()). This was not done for rxrpc and thus this feature never worked on these sockets. Formalizing this by not calling sock_recv_ts_and_drops() in rxrpc as part of an effort to move skb->dropcount into skb->cb[] Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: bluetooth: compact struct bt_skb_cb by converting boolean fields to bit ↵Eyal Birger
fields Convert boolean fields incoming and req_start to bit fields and move force_active in order save space in bt_skb_cb in an effort to use a portion of skb->cb[] for storing skb->dropcount. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02net: bluetooth: compact struct bt_skb_cb by inlining struct hci_req_ctrlEyal Birger
struct hci_req_ctrl is never used outside of struct bt_skb_cb; Inlining it frees 8 bytes on a 64 bit system in skb->cb[] allowing the addition of more ancillary data. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01cls_bpf: add initial eBPF support for programmable classifiersDaniel Borkmann
This work extends the "classic" BPF programmable tc classifier by extending its scope also to native eBPF code! This allows for user space to implement own custom, 'safe' C like classifiers (or whatever other frontend language LLVM et al may provide in future), that can then be compiled with the LLVM eBPF backend to an eBPF elf file. The result of this can be loaded into the kernel via iproute2's tc. In the kernel, they can be JITed on major archs and thus run in native performance. Simple, minimal toy example to demonstrate the workflow: #include <linux/ip.h> #include <linux/if_ether.h> #include <linux/bpf.h> #include "tc_bpf_api.h" __section("classify") int cls_main(struct sk_buff *skb) { return (0x800 << 16) | load_byte(skb, ETH_HLEN + __builtin_offsetof(struct iphdr, tos)); } char __license[] __section("license") = "GPL"; The classifier can then be compiled into eBPF opcodes and loaded via tc, for example: clang -O2 -emit-llvm -c cls.c -o - | llc -march=bpf -filetype=obj -o cls.o tc filter add dev em1 parent 1: bpf cls.o [...] As it has been demonstrated, the scope can even reach up to a fully fledged flow dissector (similarly as in samples/bpf/sockex2_kern.c). For tc, maps are allowed to be used, but from kernel context only, in other words, eBPF code can keep state across filter invocations. In future, we perhaps may reattach from a different application to those maps e.g., to read out collected statistics/state. Similarly as in socket filters, we may extend functionality for eBPF classifiers over time depending on the use cases. For that purpose, cls_bpf programs are using BPF_PROG_TYPE_SCHED_CLS program type, so we can allow additional functions/accessors (e.g. an ABI compatible offset translation to skb fields/metadata). For an initial cls_bpf support, we allow the same set of helper functions as eBPF socket filters, but we could diverge at some point in time w/o problem. I was wondering whether cls_bpf and act_bpf could share C programs, I can imagine that at some point, we introduce i) further common handlers for both (or even beyond their scope), and/or if truly needed ii) some restricted function space for each of them. Both can be abstracted easily through struct bpf_verifier_ops in future. The context of cls_bpf versus act_bpf is slightly different though: a cls_bpf program will return a specific classid whereas act_bpf a drop/non-drop return code, latter may also in future mangle skbs. That said, we can surely have a "classify" and "action" section in a single object file, or considered mentioned constraint add a possibility of a shared section. The workflow for getting native eBPF running from tc [1] is as follows: for f_bpf, I've added a slightly modified ELF parser code from Alexei's kernel sample, which reads out the LLVM compiled object, sets up maps (and dynamically fixes up map fds) if any, and loads the eBPF instructions all centrally through the bpf syscall. The resulting fd from the loaded program itself is being passed down to cls_bpf, which looks up struct bpf_prog from the fd store, and holds reference, so that it stays available also after tc program lifetime. On tc filter destruction, it will then drop its reference. Moreover, I've also added the optional possibility to annotate an eBPF filter with a name (e.g. path to object file, or something else if preferred) so that when tc dumps currently installed filters, some more context can be given to an admin for a given instance (as opposed to just the file descriptor number). Last but not least, bpf_prog_get() and bpf_prog_put() needed to be exported, so that eBPF can be used from cls_bpf built as a module. Thanks to 60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only") I think this is of no concern since anything wanting to alter eBPF opcode after verification stage would crash the kernel. [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: move read-only fields to bpf_prog and shrink bpf_prog_auxDaniel Borkmann
is_gpl_compatible and prog_type should be moved directly into bpf_prog as they stay immutable during bpf_prog's lifetime, are core attributes and they can be locked as read-only later on via bpf_prog_select_runtime(). With a bit of rearranging, this also allows us to shrink bpf_prog_aux to exactly 1 cacheline. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: add sched_cls_type and map it to sk_filter's verifier opsDaniel Borkmann
As discussed recently and at netconf/netdev01, we want to prevent making bpf_verifier_ops registration available for modules, but have them at a controlled place inside the kernel instead. The reason for this is, that out-of-tree modules can go crazy and define and register any verfifier ops they want, doing all sorts of crap, even bypassing available GPLed eBPF helper functions. We don't want to offer such a shiny playground, of course, but keep strict control to ourselves inside the core kernel. This also encourages us to design eBPF user helpers carefully and generically, so they can be shared among various subsystems using eBPF. For the eBPF traffic classifier (cls_bpf), it's a good start to share the same helper facilities as we currently do in eBPF for socket filters. That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus one day if there's a good reason to diverge the set of helper functions from the set available to socket filters, we keep ABI compatibility. In future, we could place all bpf_prog_type_list at a central place, perhaps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: remove CONFIG_BPF_SYSCALL ifdefs in socket filter codeDaniel Borkmann
This gets rid of CONFIG_BPF_SYSCALL ifdefs in the socket filter code, now that the BPF internal header can deal with it. While going over it, I also changed eBPF related functions to a sk_filter prefix to be more consistent with the rest of the file. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-01ebpf: constify various function pointer structsDaniel Borkmann
We can move bpf_map_ops and bpf_verifier_ops and other structs into ro section, bpf_map_type_list and bpf_prog_type_list into read mostly. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: cleanup static functionsEric Dumazet
tcp_fastopen_create_child() is static and should not be exported. tcp4_gso_segment() and tcp6_gso_segment() should be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: tso: allow CA_CWR state in tcp_tso_should_defer()Eric Dumazet
Another TCP issue is triggered by ECN. Under pressure, receiver gets ECN marks, and send back ACK packets with ECE TCP flag. Senders enter CA_CWR state. In this state, tcp_tso_should_defer() is short cut : if (icsk->icsk_ca_state != TCP_CA_Open) goto send_now; This means that about all ACK packets we receive are triggering a partial send, and because cwnd is kept small, we can only send a small amount of data for each incoming ACK, which in return generate more ACK packets. Allowing CA_Open and CA_CWR states to enable TSO defer in tcp_tso_should_defer() brings performance back : TSO autodefer has more chance to defer under pressure. This patch increases TSO and LRO/GRO efficiency back to normal levels, and does not impact overall ECN behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: tso: restore IW10 after TSO autosizingEric Dumazet
With sysctl_tcp_min_tso_segs being 4, it is very possible that tcp_tso_should_defer() decides not sending last 2 MSS of initial window of 10 packets. This also applies if autosizing decides to send X MSS per GSO packet, and cwnd is not a multiple of X. This patch implements an heuristic based on age of first skb in write queue : If it was sent very recently (less than half srtt), we can predict that no ACK packet will come in less than half rtt, so deferring might cause an under utilization of our window. This is visible on initial send (IW10) on web servers, but more generally on some RPC, as the last part of the message might need an extra RTT to get delivered. Tested: Ran following packetdrill test // A simple server-side test that sends exactly an initial window (IW10) // worth of packets. `sysctl -e -q net.ipv4.tcp_min_tso_segs=4` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.1 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 write(4, ..., 14600) = 14600 +0 > . 1:5841(5840) ack 1 win 457 +0 > . 5841:11681(5840) ack 1 win 457 // Following packet should be sent right now. +0 > P. 11681:14601(2920) ack 1 win 457 +.1 < . 1:1(0) ack 14601 win 257 +0 close(4) = 0 +0 > F. 14601:14601(0) ack 1 +.1 < F. 1:1(0) ack 14602 win 257 +0 > . 14602:14602(0) ack 2 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: tso: remove tp->tso_deferredEric Dumazet
TSO relies on ability to defer sending a small amount of packets. Heuristic is to wait for future ACKS in hope to send more packets at once. Current algorithm uses a per socket tso_deferred field as a pseudo timer. This pseudo timer relies on future ACK, but there is no guarantee we receive them in time. Fix would be to use a real timer, but cost of such timer is probably too expensive for typical cases. This patch changes the logic to test the time of last transmit, because we should not add bursts of more than 1ms for any given flow. We've used this patch for about two years at Google, before FQ/pacing as it would reduce a fair amount of bursts. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>