Age | Commit message (Collapse) | Author |
|
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.
The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.
At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.
The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.
During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.
A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.
For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.
This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
This patch adds the hlist_nulls_add_head() function which is
based on hlist_nulls_add_head_rcu() but without the use of
rcu_assign_pointer(). It also adds hlist_nulls_del which is
exactly the same like hlist_nulls_del_rcu().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
This patch moves the helper destruction to a function that lives
in nf_conntrack_helper.c. This new function is used in the patch
to add ctnetlink reliable event delivery.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.
The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.
BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax
|
|
Remove unused stuff.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6
|
|
The constant is being use as an alignment factor, not as a padding
factor; made reading/reviewing the code quite confusing.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
|
|
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.
This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.
We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.
We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.
As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.
skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.
Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.
Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is available in a standard MDIO register in 10GBASE-T PHYs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Once rfkill-input is disabled, the "global" states will only be used as
default initial states.
Since the states will always be the same after resume, we shouldn't
generate events on resume.
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
rfkill_set_global_sw_state() (previously rfkill_set_default()) will no
longer be exported by the rewritten rfkill core.
Instead, platform drivers which can provide persistent soft-rfkill state
across power-down/reboot should indicate their initial state by calling
rfkill_set_sw_state() before registration. Otherwise, they will be
initialized to a default value during registration by a set_block call.
We remove existing calls to rfkill_set_sw_state() which happen before
registration, since these had no effect in the old model. If these
drivers do have persistent state, the calls can be put back (subject
to testing :-). This affects hp-wmi and acer-wmi.
Drivers with persistent state will affect the global state only if
rfkill-input is enabled. This is required, otherwise booting with
wireless soft-blocked and pressing the wireless-toggle key once would
have no apparent effect. This special case will be removed in future
along with rfkill-input, in favour of a more flexible userspace daemon
(see Documentation/feature-removal-schedule.txt).
Now rfkill_global_states[n].def is only used to preserve global states
over EPO, it is renamed to ".sav".
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
In order to handle powersave frames properly we had needed
to pass these out to the device queues again, and introduce
the skb->requeue bit. This, however, also has unnecessary
overhead by needing to 'clean up' already tried frames, and
this clean-up code is also buggy when software encryption
is used.
Instead of sending the frames via the master netdev queue
again, simply put them into the pending queue. This also
fixes a problem where frames for that particular station
could be reordered when some were still on the software
queues and older ones are re-injected into the software
queue after them.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This removes the dependency on GPIO framework and lets the SPI host
driver handle the chip select. The SPI host driver is required to keep
the CS active for the entire message unless cs_change says otherwise.
This patch collects the two/three single SPI transfers into a message.
Also the delay in read path in case use_dummy_writes are not used is
moved into the SPI host driver.
Tested-by: Mike Rapoport <mike@compulab.co.il>
Tested-by: Andrey Yurovsky <andrey@cozybit.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Introduce per-conntrack locks and use them instead of the global protocol
locks to avoid contention. Especially tcp_lock shows up very high in
profiles on larger machines.
This will also allow to simplify the upcoming reliable event delivery patches.
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Since we use ERR_PTR and similar macros, we need to include
linux/err.h.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the XT_SOCKET_TRANSPARENT flag is set, enabled 'transparent'
socket option is required for the socket to be matched.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Ethtool is a standard way of getting information about
ethernet interfaces. We enhance ethtool kernel interface
& e1000e to make the MDI-X status readable via ethtool in
userspace.
Signed-off-by: Chaitanya Lala <clala@riverbed.com>
Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a netlink interface for configuration of IEEE 802.15.4 device. Also this
interface specifies events notification sent by devices towards higher layers.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for communication over IEEE 802.15.4 networks. This implementation
is neither certified nor complete, but aims to that goal. This commit contains
only the socket interface for communication over IEEE 802.15.4 networks.
One can either send RAW datagrams or use SOCK_DGRAM to encapsulate data
inside normal IEEE 802.15.4 packets.
Configuration interface, drivers and software MAC 802.15.4 implementation will
follow.
Initial implementation was done by Maxim Gorbachyov, Maxim Osipov and Pavel
Smolensky as a research project at Siemens AG. Later the stack was heavily
reworked to better suit the linux networking model, and is now maitained
as an open project partially sponsored by Siemens.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IEEE 802.15.4 stack requires several constants to be defined/adjusted.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change PSCHED_SHIFT from 10 to 6 to increase schedulers time
resolution. This will increase 16x a number of (internal) ticks per
nanosecond, and is needed to improve accuracy of schedulers based on
rate tables, like HTB, TBF or CBQ, with rates above 100Mbit. It is
assumed this change is safe for 32bit accounting of time diffs up
to 2 minutes, which should be enough for common use (extremely low
rate values may overflow, so get inaccurate instead). To make full
use of this change an updated iproute2 will be needed. (But using
older iproute2 should be safe too.)
This change breaks ticks - microseconds similarity, so some minor code
fixes might be needed. It is also planned to change naming adequately
eg. to PSCHED_TICKS2NS() etc. in the near future.
Reported-by: Antonio Almeida <vexwek@gmail.com>
Tested-by: Antonio Almeida <vexwek@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
PSCHED_NS2US() macros to enable changing this value later.
Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
definitions. This part of the patch is based on feedback from
Patrick McHardy <kaber@trash.net>.
Reported-by: Antonio Almeida <vexwek@gmail.com>
Tested-by: Antonio Almeida <vexwek@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the hope that these can be used to eliminate direct
references to the frag list implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Furthermore, it twiddles with the details of SKB list handling
directly, which we're trying to eliminate.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Passive OS fingerprinting netfilter module allows to passively detect
remote OS and perform various netfilter actions based on that knowledge.
This module compares some data (WS, MSS, options and it's order, ttl, df
and others) from packets with SYN bit set with dynamically loaded OS
fingerprints.
Fingerprint matching rules can be downloaded from OpenBSD source tree
or found in archive and loaded via netfilter netlink subsystem into
the kernel via special util found in archive.
Archive contains library file (also attached), which was shipped
with iptables extensions some time ago (at least when ipt_osf existed
in patch-o-matic).
Following changes were made in this release:
* added NLM_F_CREATE/NLM_F_EXCL checks
* dropped _rcu list traversing helpers in the protected add/remove calls
* dropped unneded structures, debug prints, obscure comment and check
Fingerprints can be downloaded from
http://www.openbsd.org/cgi-bin/cvsweb/src/etc/pf.os
or can be found in archive
Example usage:
-d switch removes fingerprints
Please consider for inclusion.
Thank you.
Passive OS fingerprint homepage (archives, examples):
http://www.ioremap.net/projects/osf
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Current conntrack code kills the ICMP conntrack entry as soon as
the first reply is received. This is incorrect, as we then see only
the first ICMP echo reply out of several possible duplicates as
ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
increases the conntrackd traffic on H-A firewalls.
Make all the ICMP conntrack entries (including the replied ones)
last for the default of nf_conntrack_icmp{,v6}_timeout seconds.
Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
With the re-write of the RFKILL subsystem it is now possible to easily
integrate RFKILL soft-switch support into the Bluetooth subsystem. All
Bluetooth devices will now get automatically RFKILL support.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
The Bluetooth source uses some endian conversion helpers, that in the end
translate to kernel standard routines. So remove this obfuscation since it
is fully pointless.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
This adds the basic constants required to add support for L2CAP Enhanced
Retransmission feature.
Based on a patch from Nathan Holstein <nathan@lampreynetworks.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Using the L2CAP_CONF_HINT macro is easier to understand than using a
hardcoded 0x80 value.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Use macros instead of hardcoded numbers to make the L2CAP source code
more readable.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Change the name of the Kernel CAPI exported function capi_ctr_reseted()
to something representing its purpose better.
Impact: renaming, no functional change
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_dma_unmap() is quite expensive for small packets,
because we use two different cache lines from skb_shared_info.
One to access nr_frags, one to access dma_maps[0]
Instead of dma_maps being an array of MAX_SKB_FRAGS + 1 elements,
let dma_head alone in a new dma_head field, close to nr_frags,
to reduce cache lines misses.
Tested on my dev machine (bnx2 & tg3 adapters), nice speedup !
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Get rid of num_dma_maps in struct skb_shared_info, as it seems unused.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
|
|
This patch removes the forcedeth device ids from pci_ids.h
The forcedeth driver uses the device id constants directly in its source
file.
[ Need to keep PCI_DEVICE_ID_NVIDIA_NVENET_15 in order to keep
drivers/pci/quirks.c building -DaveM ]
Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds support for specifying a range of queues instead of a single queue
id. Flows will be distributed across the given range.
This is useful for multicore systems: Instead of having a single
application read packets from a queue, start multiple
instances on queues x, x+1, .. x+n. Each instance can process
flows independently.
Packets for the same connection are put into the same queue.
Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev
|
|
As it stands skb fraglists can get past the check in dev_queue_xmit
if the skb is marked as GSO. In particular, if the packet doesn't
have the proper checksums for GSO, but can otherwise be handled by
the underlying device, we will not perform the fraglist check on it
at all.
If the underlying device cannot handle fraglists, then this will
break.
The fix is as simple as moving the fraglist check from the device
check into skb_gso_ok.
This has caused crashes with Xen when used together with GRO which
can generate GSO packets with fraglists.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To be easier on drivers and users, have cfg80211 register an
rfkill structure that drivers can access. When soft-killed,
simply take down all interfaces; when hard-killed the driver
needs to notify us and we will take down the interfaces
after the fact. While rfkilled, interfaces cannot be set UP.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Sometimes it is necessary to know how the state is,
and it is easier to query rfkill than keep track of
it somewhere else, so add a function for that. This
could later be expanded to return hard/soft block,
but so far that isn't necessary.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This patch introduces new cfg80211 API to set the TX power
via cfg80211, puts the wext code into cfg80211 and updates
mac80211 to use all that. The -ENETDOWN bits are a hack but
will go away soon.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
The new code added by this patch will make rfkill create
a misc character device /dev/rfkill that userspace can use
to control rfkill soft blocks and get status of devices as
well as events when the status changes.
Using it is very simple -- when you open it you can read
a number of times to get the initial state, and every
further read blocks (you can poll) on getting the next
event from the kernel. The same structure you read is
also used when writing to it to change the soft block of
a given device, all devices of a given type, or all
devices.
This also makes CONFIG_RFKILL_INPUT selectable again in
order to be able to test without it present since its
functionality can now be replaced by userspace entirely
and distros and users may not want the input part of
rfkill interfering with their userspace code. We will
also write a userspace daemon to handle all that and
consequently add the input code to the feature removal
schedule.
In order to have rfkilld support both kernels with and
without CONFIG_RFKILL_INPUT (or new kernels after its
eventual removal) we also add an ioctl (that only exists
if rfkill-input is present) to disable rfkill-input.
It is not very efficient, but at least gives the correct
behaviour in all cases.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This patch completely rewrites the rfkill core to address
the following deficiencies:
* all rfkill drivers need to implement polling where necessary
rather than having one central implementation
* updating the rfkill state cannot be done from arbitrary
contexts, forcing drivers to use schedule_work and requiring
lots of code
* rfkill drivers need to keep track of soft/hard blocked
internally -- the core should do this
* the rfkill API has many unexpected quirks, for example being
asymmetric wrt. alloc/free and register/unregister
* rfkill can call back into a driver from within a function the
driver called -- this is prone to deadlocks and generally
should be avoided
* rfkill-input pointlessly is a separate module
* drivers need to #ifdef rfkill functions (unless they want to
depend on or select RFKILL) -- rfkill should provide inlines
that do nothing if it isn't compiled in
* the rfkill structure is not opaque -- drivers need to initialise
it correctly (lots of sanity checking code required) -- instead
force drivers to pass the right variables to rfkill_alloc()
* the documentation is hard to read because it always assumes the
reader is completely clueless and contains way TOO MANY CAPS
* the rfkill code needlessly uses a lot of locks and atomic
operations in locked sections
* fix LED trigger to actually change the LED when the radio state
changes -- this wasn't done before
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> [thinkpad]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
NETDEV_UP is called after the device is set UP, but sometimes
it is useful to be able to veto the device UP. Introduce a
new NETDEV_PRE_UP notifier that can be used for exactly this.
The first use case will be cfg80211 denying interfaces to be
set UP if the device is known to be rfkill'ed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|