summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-09-30rxrpc: Fix the call timer handlingDavid Howells
The call timer's concept of a call timeout (of which there are three) that is inactive is that it is the timeout has the same expiration time as the call expiration timeout (the expiration timer is never inactive). However, I'm not resetting the timeouts when they expire, leading to repeated processing of expired timeouts when other timeout events occur. Fix this by: (1) Move the timer expiry detection into rxrpc_set_timer() inside the locked section. This means that if a timeout is set that will expire immediately, we deal with it immediately. (2) If a timeout is at or before now then it has expired. When an expiry is detected, an event is raised, the timeout is automatically inactivated and the event processor is queued. (3) If a timeout is at or after the expiry timeout then it is inactive. Inactive timeouts do not contribute to the timer setting. (4) The call timer callback can now just call rxrpc_set_timer() to handle things. (5) The call processor work function now checks the event flags rather than checking the timeouts directly. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-30rxrpc: Keep the call timeouts as ktimes rather than jiffiesDavid Howells
Keep that call timeouts as ktimes rather than jiffies so that they can be expressed as functions of RTT. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-30rxrpc: Remove error from struct rxrpc_skb_priv as it is unusedDavid Howells
Remove error from struct rxrpc_skb_priv as it is no longer used. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-30rxrpc: The offset field in struct rxrpc_skb_priv is unnecessaryDavid Howells
The offset field in struct rxrpc_skb_priv is unnecessary as the value can always be calculated. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-30rxrpc: Reduce ssthresh to peer's receive windowDavid Howells
When we receive an ACK from the peer that tells us what the peer's receive window (rwind) is, we should reduce ssthresh to rwind if rwind is smaller than ssthresh. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-30rxrpc: Switch to Congestion Avoidance mode at cwnd==ssthreshDavid Howells
Switch to Congestion Avoidance mode at cwnd == ssthresh rather than relying on cwnd getting incremented beyond ssthresh and the window size, the mode being shifted and then cwnd being corrected. We need to make sure we switch into CA mode so that we stop marking every packet for ACK. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-30ipv6 addrconf: implement RFC7559 router solicitation backoffMaciej Żenczykowski
This implements: https://tools.ietf.org/html/rfc7559 Backoff is performed according to RFC3315 section 14: https://tools.ietf.org/html/rfc3315#section-14 We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations to a negative value meaning an unlimited number of retransmits, and we make this the new default (inline with the RFC). We also add a new setting: /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval defaulting to 1 hour (per RFC recommendation). Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30net: Suppress the "Comparison to NULL could be written" warningsJia He
This is to suppress the checkpatch.pl warning "Comparison to NULL could be written". No functional changes here. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30ipv6: Remove useless parameter in __snmp6_fill_statsdevJia He
The parameter items(is always ICMP6_MIB_MAX) is useless for __snmp6_fill_statsdev Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30proc: Reduce cache miss in xfrm_statistics_seq_showJia He
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30proc: Reduce cache miss in sctp_snmp_seq_showJia He
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30proc: Reduce cache miss in snmp6_seq_showJia He
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-30proc: Reduce cache miss in snmp_seq_showJia He
This is to use the generic interfaces snmp_get_cpu_field{,64}_batch to aggregate the data by going through all the items of each cpu sequentially. Then snmp_seq_show is split into 2 parts to avoid build warning "the frame size" larger than 1024. Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-29rxrpc: Note serial number being ACK'd in the congestion management traceDavid Howells
Note the serial number of the packet being ACK'd in the congestion management trace rather than the serial number of the ACK packet. Whilst the serial number of the ACK packet is useful for matching ACK packet in the output of wireshark, the serial number that the ACK is in response to is of more use in working out how different trace lines relate. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-29rxrpc: Request more ACKs in slow-start modeDavid Howells
Set the request-ACK on more DATA packets whilst we're in slow start mode so that we get sufficient ACKs back to supply information to configure the window. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-29rxrpc: Reduce the rxrpc_local::services list to a pointerDavid Howells
Reduce the rxrpc_local::services list to just a pointer as we don't permit multiple service endpoints to bind to a single transport endpoints (this is excluded by rxrpc_lookup_local()). The reason we don't allow this is that if you send a request to an AFS filesystem service, it will try to talk back to your cache manager on the port you sent from (this is how file change notifications are handled). To prevent someone from stealing your CM callbacks, we don't let AF_RXRPC sockets share a UDP socket if at least one of them has a service bound. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-29rxrpc: When activating client conn channels, do state check inside lockDavid Howells
In rxrpc_activate_channels(), the connection cache state is checked outside of the lock, which means it can change whilst we're waking calls up, thereby changing whether or not we're allowed to wake calls up. Fix this by moving the check inside the locked region. The check to see if all the channels are currently busy can stay outside of the locked region. Whilst we're at it: (1) Split the locked section out into its own function so that we can call it from other places in a later patch. (2) Determine the mask of channels dependent on the state as we're going to add another state in a later patch that will restrict the number of simultaneous calls to 1 on a connection. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-29rxrpc: Make Tx loss-injection go through normal return and adjust tracingDavid Howells
In rxrpc_send_data_packet() make the loss-injection path return through the same code as the transmission path so that the RTT determination is initiated and any future timer shuffling will be done, despite the packet having been binned. Whilst we're at it: (1) Add to the tx_data tracepoint an indication of whether or not we're retransmitting a data packet. (2) When we're deciding whether or not to request an ACK, rather than checking if we're in fast-retransmit mode check instead if we're retransmitting. (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're not altering the sk_buff refcount nor are we just seeing it after getting it off the Tx list. (4) The rxrpc_skb_tx_lost note is then no longer used so remove it. (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-29rxrpc: Fix exclusive client connectionsDavid Howells
Exclusive connections are currently reusable (which they shouldn't be) because rxrpc_alloc_client_connection() checks the exclusive flag in the rxrpc_connection struct before it's initialised from the function parameters. This means that the DONT_REUSE flag doesn't get set. Fix this by checking the function parameters for the exclusive flag. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-28net: do not export sk_stream_write_spaceEric Dumazet
Since commit 900f65d361d3 ("tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()") we no longer need to export sk_stream_write_space() From: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28tcp: Change txhash on every SYN and RTO retransmitLawrence Brakmo
The current code changes txhash (flowlables) on every retransmitted SYN/ACK, but only after the 2nd retransmitted SYN and only after tcp_retries1 RTO retransmits. With this patch: 1) txhash is changed with every SYN retransmits 2) txhash is changed with every RTO. The result is that we can start re-routing around failed (or very congested paths) as soon as possible. Otherwise application health checks may fail and the connection may be terminated before we start to change txhash. v4: Removed sysctl, txhash is changed for all RTOs v3: Removed text saying default value of sysctl is 0 (it is 100) v2: Added sysctl documentation and cleaned code Tested with packetdrill tests Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28switchdev: remove FIB offload infrastructureJiri Pirko
Since this is now taken care of by FIB notifier, remove the code, with all unused dependencies. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28fib: introduce FIB info offload flag helpersJiri Pirko
These helpers are to be used in case someone offloads the FIB entry. The result is that if the entry is offloaded to at least one device, the offload flag is set. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28fib: introduce FIB notification infrastructureJiri Pirko
This allows to pass information about added/deleted FIB entries/rules to whoever is interested. This is done in a very similar way as devinet notifies address additions/removals. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28net/sched: cls_flower: Use a proper mask value for enc key id parameterHadar Hen Zion
The current code use the encapsulation key id value as the mask of that parameter which is wrong. Fix that by using a full mask. Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-09-25 Here are a few more Bluetooth & 802.15.4 patches for the 4.9 kernel that have popped up during the past week: - New USB ID for QCA_ROME Bluetooth device - NULL pointer dereference fix for Bluetooth mgmt sockets - Fixes for BCSP driver - Fix for updating LE scan response Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25Merge branch 'master' of ↵Pablo Neira Ayuso
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Conflicts: net/netfilter/core.c net/netfilter/nf_tables_netdev.c Resolve two conflicts before pull request for David's net-next tree: 1) Between c73c24849011 ("netfilter: nf_tables_netdev: remove redundant ip_hdr assignment") from the net tree and commit ddc8b6027ad0 ("netfilter: introduce nft_set_pktinfo_{ipv4, ipv6}_validate()"). 2) Between e8bffe0cf964 ("net: Add _nf_(un)register_hooks symbols") and Aaron Conole's patches to replace list_head with single linked list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nf_log: get rid of XT_LOG_* macrosLiping Zhang
nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros here is not appropriate. Replace them with NF_LOG_XXX. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_log: complete NFTA_LOG_FLAGS attr supportLiping Zhang
NFTA_LOG_FLAGS attribute is already supported, but the related NF_LOG_XXX flags are not exposed to the userspace. So we cannot explicitly enable log flags to log uid, tcp sequence, ip options and so on, i.e. such rule "nft add rule filter output log uid" is not supported yet. So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In order to keep consistent with other modules, change NF_LOG_MASK to refer to all supported log flags. On the other hand, add a new NF_LOG_DEFAULT_MASK to refer to the original default log flags. Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the userspace. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nf_tables: add range expressionPablo Neira Ayuso
Inverse ranges != [a,b] are not currently possible because rules are composites of && operations, and we need to express this: data < a || data > b This patch adds a new range expression. Positive ranges can be already through two cmp expressions: cmp(sreg, data, >=) cmp(sreg, data, <=) This new range expression provides an alternative way to express this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: xt_socket: fix transparent match for IPv6 request socketsKOVACS Krisztian
The introduction of TCP_NEW_SYN_RECV state, and the addition of request sockets to the ehash table seems to have broken the --transparent option of the socket match for IPv6 (around commit a9407000). Now that the socket lookup finds the TCP_NEW_SYN_RECV socket instead of the listener, the --transparent option tries to match on the no_srccheck flag of the request socket. Unfortunately, that flag was only set for IPv4 sockets in tcp_v4_init_req() by copying the transparent flag of the listener socket. This effectively causes '-m socket --transparent' not match on the ACK packet sent by the client in a TCP handshake. Based on the suggestion from Eric Dumazet, this change moves the code initializing no_srccheck to tcp_conn_request(), rendering the above scenario working again. Fixes: a940700003 ("netfilter: xt_socket: prepare for TCP_NEW_SYN_RECV support") Signed-off-by: Alex Badics <alex.badics@balabit.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: evict stale entries when user reads /proc/net/nf_conntrackFlorian Westphal
Fabian reports a possible conntrack memory leak (could not reproduce so far), however, one minor issue can be easily resolved: > cat /proc/net/nf_conntrack | wc -l = 5 > 4 minutes required to clean up the table. We should not report those timed-out entries to the user in first place. And instead of just skipping those timed-out entries while iterating over the table we can also zap them (we already do this during ctnetlink walks, but I forgot about the /proc interface). Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer") Reported-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: xt_hashlimit: Create revision 2 to support higher pps ratesVishwanath Pai
Create a new revision for the hashlimit iptables extension module. Rev 2 will support higher pps of upto 1 million, Version 1 supports only 10k. To support this we have to increase the size of the variables avg and burst in hashlimit_cfg to 64-bit. Create two new structs hashlimit_cfg2 and xt_hashlimit_mtinfo2 and also create newer versions of all the functions for match, checkentry and destroy. Some of the functions like hashlimit_mt, hashlimit_mt_check etc are very similar in both rev1 and rev2 with only minor changes, so I have split those functions and moved all the common code to a *_common function. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: xt_hashlimit: Prepare for revision 2Vishwanath Pai
I am planning to add a revision 2 for the hashlimit xtables module to support higher packets per second rates. This patch renames all the functions and variables related to revision 1 by adding _v1 at the end of the names. Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_ct: report error if mark and dir specified simultaneouslyLiping Zhang
NFT_CT_MARK is unrelated to direction, so if NFTA_CT_DIRECTION attr is specified, report EINVAL to the userspace. This validation check was already done at nft_ct_get_init, but we missed it in nft_ct_set_init. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: nft_ct: unnecessary to require dir when use ct l3proto/protocolLiping Zhang
Currently, if the user want to match ct l3proto, we must specify the direction, for example: # nft add rule filter input ct original l3proto ipv4 ^^^^^^^^ Otherwise, error message will be reported: # nft add rule filter input ct l3proto ipv4 nft add rule filter input ct l3proto ipv4 <cmdline>:1:1-38: Error: Could not process rule: Invalid argument add rule filter input ct l3proto ipv4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Actually, there's no need to require NFTA_CT_DIRECTION attr, because ct l3proto and protocol are unrelated to direction. And for compatibility, even if the user specify the NFTA_CT_DIRECTION attr, do not report error, just skip it. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ackGao Feng
It is valid that the TCP RST packet which does not set ack flag, and bytes of ack number are zero. But current seqadj codes would adjust the "0" ack to invalid ack number. Actually seqadj need to check the ack flag before adjust it for these RST packets. The following is my test case client is 10.26.98.245, and add one iptable rule: iptables -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2: --connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with tcp-reset This iptables rule could generate on TCP RST without ack flag. server:10.172.135.55 Enable the synproxy with seqadjust by the following iptables rules iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m tcp --syn -j CT --notrack iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack --ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460 iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack --ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT The following is my test result. 1. packet trace on client root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829, win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7], length 0 IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266, ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884, nop,wscale 7], length 0 IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229, options [nop,nop,TS val 452367885 ecr 15643479], length 0 IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226, options [nop,nop,TS val 15643479 ecr 452367885], length 0 IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830, win 0, length 0 2. seqadj log on server [62873.867319] Adjusting sequence number from 602341895->546723267, ack from 3695959830->3695959830 [62873.867644] Adjusting sequence number from 602341895->546723267, ack from 3695959830->3695959830 [62873.869040] Adjusting sequence number from 3695959830->3695959830, ack from 0->55618628 To summarize, it is clear that the seqadj codes adjust the 0 ack when receive one TCP RST packet without ack. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: replace list_head with single linked listAaron Conole
The netfilter hook list never uses the prev pointer, and so can be trimmed to be a simple singly-linked list. In addition to having a more light weight structure for hook traversal, struct net becomes 5568 bytes (down from 6400) and struct net_device becomes 2176 bytes (down from 2240). Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25Merge tag 'rxrpc-rewrite-20160924' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Implement slow-start and other bits This set of patches implements the RxRPC slow-start feature for AF_RXRPC to improve performance and handling of occasional packet loss. This is more or less the same as TCP slow start [RFC 5681]. Firstly, there are some ACK generation improvements: (1) Send ACKs regularly to apprise the peer of our state so that they can do congestion management of their own. (2) Send an ACK when we fill in a hole in the buffer so that the peer can find out that we did this thus forestalling retransmission. (3) Note the final DATA packet's serial number in the final ACK for correlation purposes. and a couple of bug fixes: (4) Reinitialise the ACK state and clear the ACK and resend timers upon entering the client reply reception phase to kill off any pending probe ACKs. (5) Delay the resend timer to allow for nsec->jiffies conversion errors. and then there's the slow-start pieces: (6) Summarise an ACK. (7) Schedule a PING or IDLE ACK if the reply to a client call is overdue to try and find out what happened to it. (8) Implement the slow start feature. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24gre: use nla_get_be32() to extract flowinfoLance Richardson
Eliminate a sparse endianness mismatch warning, use nla_get_be32() to extract a __be32 value instead of nla_get_u32(). Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-24rxrpc: Implement slow-startDavid Howells
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A tracepoint is added to log the state of the congestion management algorithm and the decisions it makes. Notes: (1) Since we send fixed-size DATA packets (apart from the final packet in each phase), counters and calculations are in terms of packets rather than bytes. (2) The ACK packet carries the equivalent of TCP SACK. (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly suited to SACK of a small number of packets. It seems that, almost inevitably, by the time three 'duplicate' ACKs have been seen, we have narrowed the loss down to one or two missing packets, and the FLIGHT_SIZE calculation ends up as 2. (4) In rxrpc_resend(), if there was no data that apparently needed retransmission, we transmit a PING ACK to ask the peer to tell us what its Rx window state is. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Schedule an ACK if the reply to a client call appears overdueDavid Howells
If we've sent all the request data in a client call but haven't seen any sign of the reply data yet, schedule an ACK to be sent to the server to find out if the reply data got lost. If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to demand a response to find out whether we need to retransmit. If the server says it has received all of the data, we send an IDLE ACK to tell the server that we haven't received anything in the receive phase as yet. To make this work, a non-immediate PING ACK must carry a delay. I've chosen the same as the IDLE ACK for the moment. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Generate a summary of the ACK state for later useDavid Howells
Generate a summary of the Tx buffer packet state when an ACK is received for use in a later patch that does congestion management. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Delay the resend timer to allow for nsec->jiffies conv errorDavid Howells
When determining the resend timer value, we have a value in nsec but the timer is in jiffies which may be a million or more times more coarse. nsecs_to_jiffies() rounds down - which means that the resend timeout expressed as jiffies is very likely earlier than the one expressed as nanoseconds from which it was derived. The problem is that rxrpc_resend() gets triggered by the timer, but can't then find anything to resend yet. It sets the timer again - but gets kicked off immediately again and again until the nanosecond-based expiry time is reached and we actually retransmit. Fix this by adding 1 to the jiffies-based resend_at value to counteract the rounding and make sure that the timer happens after the nanosecond-based expiry is passed. Alternatives would be to adjust the timestamp on the packets to align with the jiffie scale or to switch back to using jiffie-timestamps. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Reinitialise the call ACK and timer state for client reply phaseDavid Howells
Clear the ACK reason, ACK timer and resend timer when entering the client reply phase when the first DATA packet is received. New ACKs will be proposed once the data is queued. The resend timer is no longer relevant and we need to cancel ACKs scheduled to probe for a lost reply. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Include the last reply DATA serial number in the final ACKDavid Howells
In a client call, include the serial number of the last DATA packet of the reply in the final ACK. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24rxrpc: Send an immediate ACK if we fill in a holeDavid Howells
Send an immediate ACK if we fill in a hole in the buffer left by an out-of-sequence packet. This may allow the congestion management in the peer to avoid a retransmission if packets got reordered on the wire. Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-24netfilter: Only allow sane values in nf_register_net_hookAaron Conole
This commit adds an upfront check for sane values to be passed when registering a netfilter hook. This will be used in a future patch for a simplified hook list traversal. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: Remove explicit rcu_read_lock in nf_hook_slowAaron Conole
All of the callers of nf_hook_slow already hold the rcu_read_lock, so this cleanup removes the recursive call. This is just a cleanup, as the locking code gracefully handles this situation. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-24netfilter: call nf_hook_ingress with rcu_read_lockAaron Conole
This commit ensures that the rcu read-side lock is held while the ingress hook is called. This ensures that a call to nf_hook_slow (and ultimately nf_ingress) will be read protected. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>