<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/net/sock.h, branch v2.6.31.12</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net: sock_copy() fixes</title>
<updated>2009-07-17T01:05:26+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-07-15T23:13:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4dc6dc7162c08b9965163c9ab3f9375d4adff2c7'/>
<id>4dc6dc7162c08b9965163c9ab3f9375d4adff2c7</id>
<content type='text'>
Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1
(net: sk_prot_alloc() should not blindly overwrite memory)
took care of not zeroing whole new socket at allocation time.

sock_copy() is another spot where we should be very careful.
We should not set refcnt to a non null value, until
we are sure other fields are correctly setup, or
a lockless reader could catch this socket by mistake,
while not fully (re)initialized.

This patch puts sk_node &amp; sk_refcnt to the very beginning
of struct sock to ease sock_copy() &amp; sk_prot_alloc() job.

We add appropriate smp_wmb() before sk_refcnt initializations
to match our RCU requirements (changes to sock keys should
be committed to memory before sk_refcnt setting)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1
(net: sk_prot_alloc() should not blindly overwrite memory)
took care of not zeroing whole new socket at allocation time.

sock_copy() is another spot where we should be very careful.
We should not set refcnt to a non null value, until
we are sure other fields are correctly setup, or
a lockless reader could catch this socket by mistake,
while not fully (re)initialized.

This patch puts sk_node &amp; sk_refcnt to the very beginning
of struct sock to ease sock_copy() &amp; sk_prot_alloc() job.

We add appropriate smp_wmb() before sk_refcnt initializations
to match our RCU requirements (changes to sock keys should
be committed to memory before sk_refcnt setting)

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>memory barrier: adding smp_mb__after_lock</title>
<updated>2009-07-10T00:06:58+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2009-07-08T12:10:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ad46276952f1af34cd91d46d49ba13d347d56367'/>
<id>ad46276952f1af34cd91d46d49ba13d347d56367</id>
<content type='text'>
Adding smp_mb__after_lock define to be used as a smp_mb call after
a lock.

Making it nop for x86, since {read|write|spin}_lock() on x86 are
full memory barriers.

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adding smp_mb__after_lock define to be used as a smp_mb call after
a lock.

Making it nop for x86, since {read|write|spin}_lock() on x86 are
full memory barriers.

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: adding memory barrier to the poll and receive callbacks</title>
<updated>2009-07-10T00:06:57+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@redhat.com</email>
</author>
<published>2009-07-08T12:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a57de0b4336e48db2811a2030bb68dba8dd09d88'/>
<id>a57de0b4336e48db2811a2030bb68dba8dd09d88</id>
<content type='text'>
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp-&gt;rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp-&gt;rcv_nxt
  ...                        ...
  tp-&gt;rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk-&gt;sk_sleep &amp;&amp; waitqueue_active(sk-&gt;sk_sleep))
                                        wake_up_interruptible(sk-&gt;sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp-&gt;rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp-&gt;rcv_nxt check and sleeps, or will get the new value for
tp-&gt;rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp-&gt;rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp-&gt;rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp-&gt;rcv_nxt
  ...                        ...
  tp-&gt;rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk-&gt;sk_sleep &amp;&amp; waitqueue_active(sk-&gt;sk_sleep))
                                        wake_up_interruptible(sk-&gt;sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp-&gt;rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp-&gt;rcv_nxt check and sleeps, or will get the new value for
tp-&gt;rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp-&gt;rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa &lt;jolsa@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6</title>
<updated>2009-06-24T17:01:12+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-06-24T17:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=09ce42d3167e3f20b501fa780c2415332330fac5'/>
<id>09ce42d3167e3f20b501fa780c2415332330fac5</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6:
  bnx2: Fix the behavior of ethtool when ONBOOT=no
  qla3xxx: Don't sleep while holding lock.
  qla3xxx: Give the PHY time to come out of reset.
  ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off
  net: Move rx skb_orphan call to where needed
  ipv6: Use correct data types for ICMPv6 type and code
  net: let KS8842 driver depend on HAS_IOMEM
  can: let SJA1000 driver depend on HAS_IOMEM
  netxen: fix firmware init handshake
  netxen: fix build with without CONFIG_PM
  netfilter: xt_rateest: fix comparison with self
  netfilter: xt_quota: fix incomplete initialization
  netfilter: nf_log: fix direct userspace memory access in proc handler
  netfilter: fix some sparse endianess warnings
  netfilter: nf_conntrack: fix conntrack lookup race
  netfilter: nf_conntrack: fix confirmation race condition
  netfilter: nf_conntrack: death_by_timeout() fix
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6:
  bnx2: Fix the behavior of ethtool when ONBOOT=no
  qla3xxx: Don't sleep while holding lock.
  qla3xxx: Give the PHY time to come out of reset.
  ipv4 routing: Ensure that route cache entries are usable and reclaimable with caching is off
  net: Move rx skb_orphan call to where needed
  ipv6: Use correct data types for ICMPv6 type and code
  net: let KS8842 driver depend on HAS_IOMEM
  can: let SJA1000 driver depend on HAS_IOMEM
  netxen: fix firmware init handshake
  netxen: fix build with without CONFIG_PM
  netfilter: xt_rateest: fix comparison with self
  netfilter: xt_quota: fix incomplete initialization
  netfilter: nf_log: fix direct userspace memory access in proc handler
  netfilter: fix some sparse endianess warnings
  netfilter: nf_conntrack: fix conntrack lookup race
  netfilter: nf_conntrack: fix confirmation race condition
  netfilter: nf_conntrack: death_by_timeout() fix
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Move rx skb_orphan call to where needed</title>
<updated>2009-06-23T23:36:25+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2009-06-22T02:25:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d55d87fdff8252d0e2f7c28c2d443aee17e9d70f'/>
<id>d55d87fdff8252d0e2f7c28c2d443aee17e9d70f</id>
<content type='text'>
In order to get the tun driver to account packets, we need to be
able to receive packets with destructors set.  To be on the safe
side, I added an skb_orphan call for all protocols by default since
some of them (IP in particular) cannot handle receiving packets
destructors properly.

Now it seems that at least one protocol (CAN) expects to be able
to pass skb-&gt;sk through the rx path without getting clobbered.

So this patch attempts to fix this properly by moving the skb_orphan
call to where it's actually needed.  In particular, I've added it
to skb_set_owner_[rw] which is what most users of skb-&gt;destructor
call.

This is actually an improvement for tun too since it means that
we only give back the amount charged to the socket when the skb
is passed to another socket that will also be charged accordingly.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Tested-by: Oliver Hartkopp &lt;olver@hartkopp.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to get the tun driver to account packets, we need to be
able to receive packets with destructors set.  To be on the safe
side, I added an skb_orphan call for all protocols by default since
some of them (IP in particular) cannot handle receiving packets
destructors properly.

Now it seems that at least one protocol (CAN) expects to be able
to pass skb-&gt;sk through the rx path without getting clobbered.

So this patch attempts to fix this properly by moving the skb_orphan
call to where it's actually needed.  In particular, I've added it
to skb_set_owner_[rw] which is what most users of skb-&gt;destructor
call.

This is actually an improvement for tun too since it means that
we only give back the amount charged to the socket when the skb
is passed to another socket that will also be charged accordingly.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Tested-by: Oliver Hartkopp &lt;olver@hartkopp.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6</title>
<updated>2009-06-18T21:07:15+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-06-18T21:07:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d2aa4550379f92e929af7ed1dd4f55e6a1e331f8'/>
<id>d2aa4550379f92e929af7ed1dd4f55e6a1e331f8</id>
<content type='text'>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
  netxen: fix tx ring accounting
  netxen: fix detection of cut-thru firmware mode
  forcedeth: fix dma api mismatches
  atm: sk_wmem_alloc initial value is one
  net: correct off-by-one write allocations reports
  via-velocity : fix no link detection on boot
  Net / e100: Fix suspend of devices that cannot be power managed
  TI DaVinci EMAC : Fix rmmod error
  net: group address list and its count
  ipv4: Fix fib_trie rebalancing, part 2
  pkt_sched: Update drops stats in act_police
  sky2: version 1.23
  sky2: add GRO support
  sky2: skb recycling
  sky2: reduce default transmit ring
  sky2: receive counter update
  sky2: fix shutdown synchronization
  sky2: PCI irq issues
  sky2: more receive shutdown
  sky2: turn off pause during shutdown
  ...

Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
  netxen: fix tx ring accounting
  netxen: fix detection of cut-thru firmware mode
  forcedeth: fix dma api mismatches
  atm: sk_wmem_alloc initial value is one
  net: correct off-by-one write allocations reports
  via-velocity : fix no link detection on boot
  Net / e100: Fix suspend of devices that cannot be power managed
  TI DaVinci EMAC : Fix rmmod error
  net: group address list and its count
  ipv4: Fix fib_trie rebalancing, part 2
  pkt_sched: Update drops stats in act_police
  sky2: version 1.23
  sky2: add GRO support
  sky2: skb recycling
  sky2: reduce default transmit ring
  sky2: receive counter update
  sky2: fix shutdown synchronization
  sky2: PCI irq issues
  sky2: more receive shutdown
  sky2: turn off pause during shutdown
  ...

Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sk_wmem_alloc has initial value of one, not zero</title>
<updated>2009-06-17T11:31:25+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-06-16T10:12:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c564039fd83ea16a86a96d52632794b24849e507'/>
<id>c564039fd83ea16a86a96d52632794b24849e507</id>
<content type='text'>
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

Some protocols check sk_wmem_alloc value to determine if a timer
must delay socket deallocation. We must take care of the sk_wmem_alloc
value being one instead of zero when no write allocations are pending.

Reported by Ingo Molnar, and full diagnostic from David Miller.

This patch introduces three helpers to get read/write allocations
and a followup patch will use these helpers to report correct
write allocations to user.

Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.

Some protocols check sk_wmem_alloc value to determine if a timer
must delay socket deallocation. We must take care of the sk_wmem_alloc
value being one instead of zero when no write allocations are pending.

Reported by Ingo Molnar, and full diagnostic from David Miller.

This patch introduces three helpers to get read/write allocations
and a followup patch will use these helpers to report correct
write allocations to user.

Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck</title>
<updated>2009-06-16T20:09:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-06-16T20:09:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3fec0fe35a4ff048484f1408385a27695d4273b'/>
<id>b3fec0fe35a4ff048484f1408385a27695d4273b</id>
<content type='text'>
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
  signal: fix __send_signal() false positive kmemcheck warning
  fs: fix do_mount_root() false positive kmemcheck warning
  fs: introduce __getname_gfp()
  trace: annotate bitfields in struct ring_buffer_event
  net: annotate struct sock bitfield
  c2port: annotate bitfield for kmemcheck
  net: annotate inet_timewait_sock bitfields
  ieee1394/csr1212: fix false positive kmemcheck report
  ieee1394: annotate bitfield
  net: annotate bitfields in struct inet_sock
  net: use kmemcheck bitfields API for skbuff
  kmemcheck: introduce bitfield API
  kmemcheck: add opcode self-testing at boot
  x86: unify pte_hidden
  x86: make _PAGE_HIDDEN conditional
  kmemcheck: make kconfig accessible for other architectures
  kmemcheck: enable in the x86 Kconfig
  kmemcheck: add hooks for the page allocator
  kmemcheck: add hooks for page- and sg-dma-mappings
  kmemcheck: don't track page tables
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits)
  signal: fix __send_signal() false positive kmemcheck warning
  fs: fix do_mount_root() false positive kmemcheck warning
  fs: introduce __getname_gfp()
  trace: annotate bitfields in struct ring_buffer_event
  net: annotate struct sock bitfield
  c2port: annotate bitfield for kmemcheck
  net: annotate inet_timewait_sock bitfields
  ieee1394/csr1212: fix false positive kmemcheck report
  ieee1394: annotate bitfield
  net: annotate bitfields in struct inet_sock
  net: use kmemcheck bitfields API for skbuff
  kmemcheck: introduce bitfield API
  kmemcheck: add opcode self-testing at boot
  x86: unify pte_hidden
  x86: make _PAGE_HIDDEN conditional
  kmemcheck: make kconfig accessible for other architectures
  kmemcheck: enable in the x86 Kconfig
  kmemcheck: add hooks for the page allocator
  kmemcheck: add hooks for page- and sg-dma-mappings
  kmemcheck: don't track page tables
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>net: annotate struct sock bitfield</title>
<updated>2009-06-15T13:49:36+00:00</updated>
<author>
<name>Vegard Nossum</name>
<email>vegard.nossum@gmail.com</email>
</author>
<published>2009-02-26T13:46:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a98b65a3ad71e702e760bc63f57684301628e837'/>
<id>a98b65a3ad71e702e760bc63f57684301628e837</id>
<content type='text'>
2009/2/24 Ingo Molnar &lt;mingo@elte.hu&gt;:
&gt; ok, this is the last warning i have from today's overnight -tip
&gt; testruns - a 32-bit system warning in sock_init_data():
&gt;
&gt; [    2.610389] NET: Registered protocol family 16
&gt; [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
&gt; [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
&gt; [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
&gt; [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
&gt; [    2.641038]          ^
&gt; [    2.643376]
&gt; [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
&gt; [    2.648003] EIP: 0060:[&lt;c07141a1&gt;] EFLAGS: 00010282 CPU: 0
&gt; [    2.652008] EIP is at sock_init_data+0xa1/0x190
&gt; [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
&gt; [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
&gt; [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
&gt; [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
&gt; [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
&gt; [    2.676003] DR6: ffff4ff0 DR7: 00000400
&gt; [    2.680002]  [&lt;c07423e5&gt;] __netlink_create+0x35/0xa0
&gt; [    2.684002]  [&lt;c07443cc&gt;] netlink_kernel_create+0x4c/0x140
&gt; [    2.688002]  [&lt;c072755e&gt;] rtnetlink_net_init+0x1e/0x40
&gt; [    2.696002]  [&lt;c071b601&gt;] register_pernet_operations+0x11/0x30
&gt; [    2.700002]  [&lt;c071b72c&gt;] register_pernet_subsys+0x1c/0x30
&gt; [    2.704002]  [&lt;c0bf3c8c&gt;] rtnetlink_init+0x4c/0x100
&gt; [    2.708002]  [&lt;c0bf4669&gt;] netlink_proto_init+0x159/0x170
&gt; [    2.712002]  [&lt;c0101124&gt;] do_one_initcall+0x24/0x150
&gt; [    2.716002]  [&lt;c0bbf3c7&gt;] do_initcalls+0x27/0x40
&gt; [    2.723201]  [&lt;c0bbf3fc&gt;] do_basic_setup+0x1c/0x20
&gt; [    2.728002]  [&lt;c0bbfb8a&gt;] kernel_init+0x5a/0xa0
&gt; [    2.732002]  [&lt;c0103e47&gt;] kernel_thread_helper+0x7/0x10
&gt; [    2.736002]  [&lt;ffffffff&gt;] 0xffffffff

We fix this false positive by annotating the bitfield in struct
sock.

Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Vegard Nossum &lt;vegard.nossum@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
2009/2/24 Ingo Molnar &lt;mingo@elte.hu&gt;:
&gt; ok, this is the last warning i have from today's overnight -tip
&gt; testruns - a 32-bit system warning in sock_init_data():
&gt;
&gt; [    2.610389] NET: Registered protocol family 16
&gt; [    2.616138] initcall netlink_proto_init+0x0/0x170 returned 0 after 7812 usecs
&gt; [    2.620010] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f642c184)
&gt; [    2.624002] 010000000200000000000000604990c000000000000000000000000000000000
&gt; [    2.634076]  i i i i i i u u i i i i i i i i i i i i i i i i i i i i i i i i
&gt; [    2.641038]          ^
&gt; [    2.643376]
&gt; [    2.644004] Pid: 1, comm: swapper Not tainted (2.6.29-rc6-tip-01751-g4d1c22c-dirty #885)
&gt; [    2.648003] EIP: 0060:[&lt;c07141a1&gt;] EFLAGS: 00010282 CPU: 0
&gt; [    2.652008] EIP is at sock_init_data+0xa1/0x190
&gt; [    2.656003] EAX: 0001a800 EBX: f6836c00 ECX: 00463000 EDX: c0e46fe0
&gt; [    2.660003] ESI: f642c180 EDI: c0b83088 EBP: f6863ed8 ESP: c0c412ec
&gt; [    2.664003]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
&gt; [    2.668003] CR0: 8005003b CR2: f682c400 CR3: 00b91000 CR4: 000006f0
&gt; [    2.672003] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
&gt; [    2.676003] DR6: ffff4ff0 DR7: 00000400
&gt; [    2.680002]  [&lt;c07423e5&gt;] __netlink_create+0x35/0xa0
&gt; [    2.684002]  [&lt;c07443cc&gt;] netlink_kernel_create+0x4c/0x140
&gt; [    2.688002]  [&lt;c072755e&gt;] rtnetlink_net_init+0x1e/0x40
&gt; [    2.696002]  [&lt;c071b601&gt;] register_pernet_operations+0x11/0x30
&gt; [    2.700002]  [&lt;c071b72c&gt;] register_pernet_subsys+0x1c/0x30
&gt; [    2.704002]  [&lt;c0bf3c8c&gt;] rtnetlink_init+0x4c/0x100
&gt; [    2.708002]  [&lt;c0bf4669&gt;] netlink_proto_init+0x159/0x170
&gt; [    2.712002]  [&lt;c0101124&gt;] do_one_initcall+0x24/0x150
&gt; [    2.716002]  [&lt;c0bbf3c7&gt;] do_initcalls+0x27/0x40
&gt; [    2.723201]  [&lt;c0bbf3fc&gt;] do_basic_setup+0x1c/0x20
&gt; [    2.728002]  [&lt;c0bbfb8a&gt;] kernel_init+0x5a/0xa0
&gt; [    2.732002]  [&lt;c0103e47&gt;] kernel_thread_helper+0x7/0x10
&gt; [    2.736002]  [&lt;ffffffff&gt;] 0xffffffff

We fix this false positive by annotating the bitfield in struct
sock.

Reported-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Vegard Nossum &lt;vegard.nossum@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: No more expensive sock_hold()/sock_put() on each tx</title>
<updated>2009-06-11T09:55:43+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-06-11T09:55:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2b85a34e911bf483c27cfdd124aeb1605145dc80'/>
<id>2b85a34e911bf483c27cfdd124aeb1605145dc80</id>
<content type='text'>
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb-&gt;truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb-&gt;truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
