summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-07-28vlan: mask vlan prio bitsEric Dumazet
[ Upstream commit d4b812dea4a236f729526facf97df1a9d18e191c ] In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e ("vlan: don't deliver frames for unknown vlans to protocols") Florian made sure we set pkt_type to PACKET_OTHERHOST if the vlan id is set and we could find a vlan device for this particular id. But we also have a problem if prio bits are set. Steinar reported an issue on a router receiving IPv6 frames with a vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device, because skb->vlan_tci is set. Forwarded frame is completely corrupted : We can see (8100:4000) being inserted in the middle of IPv6 source address : 16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 > 9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64 0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651 0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213 0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 It seems we are not really ready to properly cope with this right now. We can probably do better in future kernels : vlan_get_ingress_priority() should be a netdev property instead of a per vlan_dev one. For stable kernels, lets clear vlan_tci to fix the bugs. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang
[ Upstream commit ece793fcfc417b3925844be88a6a6dc82ae8f7c6 ] We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGSJason Wang
[ Upstream commit 885291761dba2bfe04df4c0f7bb75e4c920ab82e ] We try to linearize part of the skb when the number of iov is greater than MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest network. Solve this problem by calculate the pages needed for iov before trying to do zerocopy and switch to use copy instead of zerocopy if it needs more than MAX_SKB_FRAGS. This is done through introducing a new helper to count the pages for iov, and call uarg->callback() manually when switching from zerocopy to copy to notify vhost. We can do further optimization on top. The bug were introduced from commit 0690899b4d4501b3505be069b9a687e68ccbe15b (tun: experimental zero copy tx support) Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28pkt_sched: sch_qfq: remove a source of high packet delay/jitterPaolo Valente
[ Upstream commit 87f40dd6ce7042caca0b3b557e8923127f51f902 ] QFQ+ inherits from QFQ a design choice that may cause a high packet delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a special quantity, the system virtual time, to track the service provided by the ideal system it approximates. When a packet is dequeued, this quantity must be incremented by the size of the packet, divided by the sum of the weights of the aggregates waiting to be served. Tracking this sum correctly is a non-trivial task, because, to preserve tight service guarantees, the decrement of this sum must be delayed in a special way [1]: this sum can be decremented only after that its value would decrease also in the ideal system approximated by QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous' weight sum, increased and decreased immediately as the weight of an aggregate changes, and as an aggregate is created or destroyed (which, in its turn, happens as a consequence of some class being created/destroyed/changed). However, to avoid the problems caused to service guarantees by these immediate decreases, QFQ+ increments the system virtual time using the maximum value allowed for the weight sum, 2^10, in place of the dynamic, instantaneous value. The instantaneous value of the weight sum is used only to check whether a request of weight increase or a class creation can be satisfied. Unfortunately, the problems caused by this choice are worse than the temporary degradation of the service guarantees that may occur, when a class is changed or destroyed, if the instantaneous value of the weight sum was used to update the system virtual time. In fact, the fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is equal to the ratio between the weight of the aggregate and the sum of the weights of the competing aggregates. The packet delay guaranteed to the aggregate is instead inversely proportional to the guaranteed bandwidth. By using the maximum possible value, and not the actual value of the weight sum, QFQ+ provides each aggregate with the worst possible service guarantees, and not with service guarantees related to the actual set of competing aggregates. To see the consequences of this fact, consider the following simple example. Suppose that only the following aggregates are backlogged, i.e., that only the classes in the following aggregates have packets to transmit: one aggregate with weight 10, say A, and ten aggregates with weight 1, say B1, B2, ..., B10. In particular, suppose that these aggregates are always backlogged. Given the weight distribution, the smoothest and fairest service order would be: A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ... QFQ+ would provide exactly this optimal service if it used the actual value for the weight sum instead of the maximum possible value, i.e., 11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it serves aggregates as follows (easy to prove and to reproduce experimentally): A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ... By replacing 10 with N in the above example, and by increasing N, one can increase at will the maximum packet delay and the jitter experienced by the classes in aggregate A. This patch addresses this issue by just using the above 'instantaneous' value of the weight sum, instead of the maximum possible value, when updating the system virtual time. After the instantaneous weight sum is decreased, QFQ+ may deviate from the ideal service for a time interval in the order of the time to serve one maximum-size packet for each backlogged class. The worst-case extent of the deviation exhibited by QFQ+ during this time interval [1] is basically the same as of the deviation described above (but, without this patch, QFQ+ suffers from such a deviation all the time). Finally, this patch modifies the comment to the function qfq_slot_insert, to make it coherent with the fact that the weight sum used by QFQ+ can now be lower than the maximum possible value. [1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix", Proceedings of AAA-IDEA'05, June 2005. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28hyperv: Fix the NETIF_F_SG flag setting in netvscHaiyang Zhang
[ Upstream commit f45708209dc445bac0844f6ce86e315a2ffe8a29 ] SG mode is not currently supported by netvsc, so remove this flag for now. Otherwise, it will be unconditionally enabled by commit ec5f0615642 "Kill link between CSUM and SG features" Previously, the SG feature is disabled because CSUM is not set here. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28be2net: Fix to avoid hardware workaround when not neededSarveshwar Bandi
[ Upstream commit 52fe29e4bb614367c108b717c6d7fe5953eb7af3 ] Hardware workaround requesting hardware to skip vlan insertion is necessary only when umc or qnq is enabled. Enabling this workaround in other scenarios could cause controller to stall. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv4: set transport header earlierEric Dumazet
[ Upstream commit 21d1196a35f5686c4323e42a62fdb4b23b0ab4a3 ] commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a performance regression for non GRO traffic, basically disabling IP early demux. IPv6 stack resets transport header in ip6_rcv() before calling IP early demux in ip6_rcv_finish(), while IPv4 does this only in ip_local_deliver_finish(), _after_ IP early demux. GRO traffic happened to enable IP early demux because transport header is also set in inet_gro_receive() Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the same : transport_header should be set in ip_rcv() instead of ip_local_deliver_finish() ip_local_deliver_finish() can also use skb_network_header_len() which is faster than ip_hdrlen() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28atl1e: unmap partially mapped skb on dma error and free skbNeil Horman
[ Upstream commit 584ec4355355ffac43571b02a314d43eb2f7fcbf ] Ben Hutchings pointed out that my recent update to atl1e in commit 352900b583b2852152a1e05ea0e8b579292e731e ("atl1e: fix dma mapping warnings") was missing a bit of code. Specifically it reset the hardware tx ring to its origional state when we hit a dma error, but didn't unmap any exiting mappings from the operation. This patch fixes that up. It also remembers to free the skb in the event that an error occurs, so we don't leak. Untested, as I don't have hardware. I think its pretty straightforward, but please review closely. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jay Cliburn <jcliburn@gmail.com> CC: Chris Snook <chris.snook@gmail.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28atl1e: fix dma mapping warningsNeil Horman
[ Upstream commit 352900b583b2852152a1e05ea0e8b579292e731e ] Recently had this backtrace reported: WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930() Hardware name: System Product Name ATL1E 0000:02:00.0: DMA-API: device driver failed to check map error[device address=0x00000000cbfd1000] [size=90 bytes] [mapped as single] Modules linked in: xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support snd_hda_intel acpi_cpufreq mperf coretemp btrfs zlib_deflate snd_hda_codec snd_hwdep microcode raid6_pq libcrc32c snd_seq usblp serio_raw xor snd_seq_device joydev snd_pcm snd_page_alloc snd_timer snd lpc_ich i2c_i801 soundcore mfd_core atl1e asus_atk0110 ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core pata_marvell uinput Pid: 314, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.3.fc19.x86_64 #1 Call Trace: <IRQ> [<ffffffff81069106>] warn_slowpath_common+0x66/0x80 [<ffffffff8106916c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8138151d>] check_unmap+0x47d/0x930 [<ffffffff810ad048>] ? sched_clock_cpu+0xa8/0x100 [<ffffffff81381a2f>] debug_dma_unmap_page+0x5f/0x70 [<ffffffff8137ce30>] ? unmap_single+0x20/0x30 [<ffffffffa01569a1>] atl1e_intr+0x3a1/0x5b0 [atl1e] [<ffffffff810d53fd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff81119636>] handle_irq_event_percpu+0x56/0x390 [<ffffffff811199ad>] handle_irq_event+0x3d/0x60 [<ffffffff8111cb6a>] handle_fasteoi_irq+0x5a/0x100 [<ffffffff8101c36f>] handle_irq+0xbf/0x150 [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50 [<ffffffff81073b10>] ? irq_enter+0x50/0xa0 [<ffffffff8172738d>] do_IRQ+0x4d/0xc0 [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50 [<ffffffff8171c6b2>] common_interrupt+0x72/0x72 <EOI> [<ffffffff810db5b2>] ? lock_release+0xc2/0x310 [<ffffffff8109ea04>] lg_local_unlock_cpu+0x24/0x50 [<ffffffff811dcb2f>] file_sb_list_del+0x3f/0x50 [<ffffffff811dcb6d>] fput+0x2d/0xc0 [<ffffffff811d8ea1>] filp_close+0x61/0x90 [<ffffffff811fae4d>] __close_fd+0x8d/0x150 [<ffffffff811d8ef0>] sys_close+0x20/0x50 [<ffffffff81725699>] system_call_fastpath+0x16/0x1b The usual straighforward failure to check for dma_mapping_error after a map operation is completed. This patch should fix it, the reporter wandered off after filing this bz: https://bugzilla.redhat.com/show_bug.cgi?id=954170 and I don't have hardware to test, but the fix is pretty straightforward, so I figured I'd post it for review. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Jay Cliburn <jcliburn@gmail.com> CC: Chris Snook <chris.snook@gmail.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: only static routes qualify for equal cost multipathingHannes Frederic Sowa
[ Upstream commit 307f2fb95e9b96b3577916e73d92e104f8f26494 ] Static routes in this case are non-expiring routes which did not get configured by autoconf or by icmpv6 redirects. To make sure we actually get an ecmp route while searching for the first one in this fib6_node's leafs, also make sure it matches the ecmp route assumptions. v2: a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF already ensures that this route, even if added again without RTF_EXPIRES (in case of a RA announcement with infinite timeout), does not cause the rt6i_nsiblings logic to go wrong if a later RA updates the expiration time later. v3: a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so, because an pmtu event could update the RTF_EXPIRES flag and we would not count this route, if another route joins this set. We now filter only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that don't get changed after rt6_info construction. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28gre: Fix MTU sizing check for gretap tunnelsAlexander Duyck
[ Upstream commit 8c91e162e058bb91b7766f26f4d5823a21941026 ] This change fixes an MTU sizing issue seen with gretap tunnels when non-gso packets are sent from the interface. In my case I was able to reproduce the issue by simply sending a ping of 1421 bytes with the gretap interface created on a device with a standard 1500 mtu. This fix is based on the fact that the tunnel mtu is already adjusted by dev->hard_header_len so it would make sense that any packets being compared against that mtu should also be adjusted by hard_header_len and the tunnel header instead of just the tunnel header. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reported-by: Cong Wang <amwang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ifb: fix oops when loading the ifb faileddingtianhong
[ Upstream commit f2966cd5691058b8674a20766525bedeaea9cbcf ] If __rtnl_link_register() return faild when loading the ifb, it will take the wrong path and get oops, so fix it just like dummy. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28dummy: fix oops when loading the dummy faileddingtianhong
[ Upstream commit 2c8a01894a12665d8059fad8f0a293c98a264121 ] We rename the dummy in modprobe.conf like this: install dummy0 /sbin/modprobe -o dummy0 --ignore-install dummy install dummy1 /sbin/modprobe -o dummy1 --ignore-install dummy We got oops when we run the command: modprobe dummy0 modprobe dummy1 ------------[ cut here ]------------ [ 3302.187584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 3302.195411] IP: [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0 [ 3302.201844] PGD 85c94a067 PUD 8517bd067 PMD 0 [ 3302.206305] Oops: 0002 [#1] SMP [ 3302.299737] task: ffff88105ccea300 ti: ffff880eba4a0000 task.ti: ffff880eba4a0000 [ 3302.307186] RIP: 0010:[<ffffffff813fe62a>] [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0 [ 3302.316044] RSP: 0018:ffff880eba4a1dd8 EFLAGS: 00010246 [ 3302.321332] RAX: 0000000000000000 RBX: ffffffff81a9d738 RCX: 0000000000000002 [ 3302.328436] RDX: 0000000000000000 RSI: ffffffffa04d602c RDI: ffff880eba4a1dd8 [ 3302.335541] RBP: ffff880eba4a1e18 R08: dead000000200200 R09: dead000000100100 [ 3302.342644] R10: 0000000000000080 R11: 0000000000000003 R12: ffffffff81a9d788 [ 3302.349748] R13: ffffffffa04d7020 R14: ffffffff81a9d670 R15: ffff880eba4a1dd8 [ 3302.364910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3302.370630] CR2: 0000000000000008 CR3: 000000085e15e000 CR4: 00000000000427e0 [ 3302.377734] DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001 [ 3302.384838] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3302.391940] Stack: [ 3302.393944] ffff880eba4a1dd8 ffff880eba4a1dd8 ffff880eba4a1e18 ffffffffa04d70c0 [ 3302.401350] 00000000ffffffef ffffffffa01a8000 0000000000000000 ffffffff816111c8 [ 3302.408758] ffff880eba4a1e48 ffffffffa01a80be ffff880eba4a1e48 ffffffffa04d70c0 [ 3302.416164] Call Trace: [ 3302.418605] [<ffffffffa01a8000>] ? 0xffffffffa01a7fff [ 3302.423727] [<ffffffffa01a80be>] dummy_init_module+0xbe/0x1000 [dummy0] [ 3302.430405] [<ffffffffa01a8000>] ? 0xffffffffa01a7fff [ 3302.435535] [<ffffffff81000322>] do_one_initcall+0x152/0x1b0 [ 3302.441263] [<ffffffff810ab24b>] do_init_module+0x7b/0x200 [ 3302.446824] [<ffffffff810ad3d2>] load_module+0x4e2/0x530 [ 3302.452215] [<ffffffff8127ae40>] ? ddebug_dyndbg_boot_param_cb+0x60/0x60 [ 3302.458979] [<ffffffff810ad5f1>] SyS_init_module+0xd1/0x130 [ 3302.464627] [<ffffffff814b9652>] system_call_fastpath+0x16/0x1b [ 3302.490090] RIP [<ffffffff813fe62a>] __rtnl_link_unregister+0x9a/0xd0 [ 3302.496607] RSP <ffff880eba4a1dd8> [ 3302.500084] CR2: 0000000000000008 [ 3302.503466] ---[ end trace 8342d49cd49f78ed ]--- The reason is that when loading dummy, if __rtnl_link_register() return failed, the init_module should return and avoid take the wrong path. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREFHannes Frederic Sowa
[ Upstream commit afc154e978de1eb11c555bc8bcec1552f75ebc43 ] This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available"). Since the removal of rt->n in rt6_info we can end up with a dst == NULL in rt6_check_neigh. In case the kernel is not compiled with CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown NUD state but we must not avoid doing round robin selection on routes with the same target. So introduce and pass down a boolean ``do_rr'' to indicate when we should update rt->rr_ptr. As soon as no route is valid we do backtracking and do a lookup on a higher level in the fib trie. v2: a) Improved rt6_check_neigh logic (no need to create neighbour there) and documented return values. v3: a) Introduce enum rt6_nud_state to get rid of the magic numbers (thanks to David Miller). b) Update and shorten commit message a bit to actualy reflect the source. Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28alx: fix lockdep annotationMaarten Lankhorst
[ Upstream commit a8798a5c77c9981e88caef1373a3310bf8aed219 ] Move spin_lock_init to be called before the spinlocks are used, preventing a lockdep splat. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-289p: fix off by one causing access violations and memory corruptionSasha Levin
[ Upstream commit 110ecd69a9feea82a152bbf9b12aba57e6396883 ] p9_release_pages() would attempt to dereference one value past the end of pages[]. This would cause the following crashes: [ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000 [ 6293.174146] IP: [<ffffffff8412793b>] p9_release_pages+0x3b/0x60 [ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060 [ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 6293.180060] Modules linked in: [ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G W 3.10.0-next-20130710-sasha #3954 [ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000 [ 6293.180060] RIP: 0010:[<ffffffff8412793b>] [<ffffffff8412793b>] p9_release_pages+0x3b/0x60 [ 6293.214316] RSP: 0000:ffff880787ddfc28 EFLAGS: 00010202 [ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000 [ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40 [ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000 [ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 [ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070 [ 6293.222017] FS: 00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000 [ 6293.256784] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0 [ 6293.256784] Stack: [ 6293.256784] ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8 [ 6293.256784] ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001 [ 6293.256784] ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000 [ 6293.256784] Call Trace: [ 6293.256784] [<ffffffff84128be8>] p9_virtio_zc_request+0x598/0x630 [ 6293.256784] [<ffffffff8115c610>] ? wake_up_bit+0x40/0x40 [ 6293.256784] [<ffffffff841209b1>] p9_client_zc_rpc+0x111/0x3a0 [ 6293.256784] [<ffffffff81174b78>] ? sched_clock_cpu+0x108/0x120 [ 6293.256784] [<ffffffff84122a21>] p9_client_read+0xe1/0x2c0 [ 6293.256784] [<ffffffff81708a90>] v9fs_file_read+0x90/0xc0 [ 6293.256784] [<ffffffff812bd073>] vfs_read+0xc3/0x130 [ 6293.256784] [<ffffffff811a78bd>] ? trace_hardirqs_on+0xd/0x10 [ 6293.256784] [<ffffffff812bd5a2>] SyS_read+0x62/0xa0 [ 6293.256784] [<ffffffff841a1a00>] tracesys+0xdd/0xe2 [ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 <48> 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c [ 6293.256784] RIP [<ffffffff8412793b>] p9_release_pages+0x3b/0x60 [ 6293.256784] RSP <ffff880787ddfc28> [ 6293.256784] CR2: ffff8807c96f3000 [ 6293.256784] ---[ end trace 50822ee72cd360fc ]--- Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: in case of link failure remove route directly instead of letting it expireHannes Frederic Sowa
[ Upstream commit 1eb4f758286884e7566627164bca4c4a16952a83 ] We could end up expiring a route which is part of an ecmp route set. Doing so would invalidate the rt->rt6i_nsiblings calculations and could provoke the following panic: [ 80.144667] ------------[ cut here ]------------ [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! [ 80.145172] invalid opcode: 0000 [#1] SMP [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000 [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 80.145172] Stack: [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280 [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680 [ 80.145172] Call Trace: [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP <ffff880118771798> [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28macvtap: correctly linearize skb when zerocopy is usedJason Wang
[ Upstream commit 61d46bf979d5cd7c164709a80ad5676a35494aae ] Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to linearize parts of the skb to let the rest of iov to be fit in the frags, we need count copylen into linear when calling macvtap_alloc_skb() instead of partly counting it into data_len. Since this breaks zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should be zero at beginning. This cause nr_frags to be increased wrongly without setting the correct frags. This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73 (macvtap: zerocopy: validate vectors before building skb). Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28tuntap: correctly linearize skb when zerocopy is usedJason Wang
[ Upstream commit 3dd5c3308e8b671e8e8882ba972f51cefbe9fd0d ] Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to linearize parts of the skb to let the rest of iov to be fit in the frags, we need count copylen into linear when calling tun_alloc_skb() instead of partly counting it into data_len. Since this breaks zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should be zero at beginning. This cause nr_frags to be increased wrongly without setting the correct frags. This bug were introduced from 0690899b4d4501b3505be069b9a687e68ccbe15b (tun: experimental zero copy tx support) Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ifb: fix rcu_sched self-detected stallsdingtianhong
[ Upstream commit 440d57bc5ff55ec1efb3efc9cbe9420b4bbdfefa ] According to the commit 16b0dc29c1af9df341428f4c49ada4f626258082 (dummy: fix rcu_sched self-detected stalls) Eric Dumazet fix the problem in dummy, but the ifb will occur the same problem like the dummy modules. Trying to "modprobe ifb numifbs=30000" triggers : INFO: rcu_sched self-detected stall on CPU After this splat, RTNL is locked and reboot is needed. We must call cond_resched() to avoid this, even holding RTNL. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28sunvnet: vnet_port_remove must call unregister_netdevDave Kleikamp
[ Upstream commit aabb9875d02559ab9b928cd6f259a5cc4c21a589 ] The missing call to unregister_netdev() leaves the interface active after the driver is unloaded by rmmod. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28vhost-net: fix use-after-free in vhost_net_flushMichael S. Tsirkin
[ Upstream commit c38e39c378f46f00ce922dd40a91043a9925c28d ] vhost_net_ubuf_put_and_wait has a confusing name: it will actually also free it's argument. Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01 "vhost-net: flush outstanding DMAs on memory change" vhost_net_flush tries to use the argument after passing it to vhost_net_ubuf_put_and_wait, this results in use after free. To fix, don't free the argument in vhost_net_ubuf_put_and_wait, add an new API for callers that want to free ubufs. Acked-by: Asias He <asias@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28virtio_net: fix race in RX VQ processingMichael S. Tsirkin
[ Upstream commit cbdadbbf0c790f79350a8f36029208944c5487d0 ] virtio net called virtqueue_enable_cq on RX path after napi_complete, so with NAPI_STATE_SCHED clear - outside the implicit napi lock. This violates the requirement to synchronize virtqueue_enable_cq wrt virtqueue_add_buf. In particular, used event can move backwards, causing us to lose interrupts. In a debug build, this can trigger panic within START_USE. Jason Wang reports that he can trigger the races artificially, by adding udelay() in virtqueue_enable_cb() after virtio_mb(). However, we must call napi_complete to clear NAPI_STATE_SCHED before polling the virtqueue for used buffers, otherwise napi_schedule_prep in a callback will fail, causing us to lose RX events. To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED set (under napi lock), later call virtqueue_poll with NAPI_STATE_SCHED clear (outside the lock). Reported-by: Jason Wang <jasowang@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28virtio: support unlocked queue pollMichael S. Tsirkin
[ Upstream commit cc229884d3f77ec3b1240e467e0236c3e0647c0c ] This adds a way to check ring empty state after enable_cb outside any locks. Will be used by virtio_net. Note: there's room for more optimization: caller is likely to have a memory barrier already, which means we might be able to get rid of a barrier here. Deferring this optimization until we do some benchmarking. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bitJongsung Kim
[ Upstream commit 01276ed2424eb78c95461545410923d5da154d31 ] Signed-off-by: Jongsung Kim <neidhard.kim@lge.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28sfc: Fix memory leak when discarding scattered packetsBen Hutchings
[ Upstream commit 734d4e159b283a4ae4d007b7e7a91d84398ccb92 ] Commit 2768935a4660 ('sfc: reuse pages to avoid DMA mapping/unmapping costs') did not fully take account of DMA scattering which was introduced immediately before. If a received packet is invalid and must be discarded, we only drop a reference to the first buffer's page, but we need to drop a reference for each buffer the packet used. I think this bug was missed partly because efx_recycle_rx_buffers() was not renamed and so no longer does what its name says. It does not change the state of buffers, but only prepares the underlying pages for recycling. Rename it accordingly. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: rt6_check_neigh should successfully verify neigh if no NUD information ↵Hannes Frederic Sowa
are available [ Upstream commit 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ] After the removal of rt->n we do not create a neighbour entry at route insertion time (rt6_bind_neighbour is gone). As long as no neighbour is created because of "useful traffic" we skip this routing entry because rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and thus returns false. This change was introduced by commit 887c95cc1da53f66a5890fdeab13414613010097 ("ipv6: Complete neighbour entry removal from dst_entry.") To quote RFC4191: "If the host has no information about the router's reachability, then the host assumes the router is reachable." and also: "A host MUST NOT probe a router's reachability in the absence of useful traffic that the host would have sent to the router if it were reachable." So, just assume the router is reachable and let's rt6_probe do the rest. We don't need to create a neighbour on route insertion time. If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support) a neighbour is only valid if its nud_state is NUD_VALID. I did not find any references that we should probe the router on route insertion time via the other RFCs. So skip this route in that case. v2: a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov) Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_sizeHannes Frederic Sowa
[ Upstream commit 75a493e60ac4bbe2e977e7129d6d8cbb0dd236be ] If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track of this when appending the second frame on a corked socket. This results in the following splat: [37598.993962] ------------[ cut here ]------------ [37598.994008] kernel BUG at net/core/skbuff.c:2064! [37598.994008] invalid opcode: 0000 [#1] SMP [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm [37598.994008] snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video [37598.994008] CPU 0 [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG [37598.994008] RIP: 0010:[<ffffffff815443a5>] [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP: 0018:ffff88003670da18 EFLAGS: 00010202 [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0 [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00 [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040 [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8 [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000 [37598.994008] FS: 00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [37598.994008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0 [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0) [37598.994008] Stack: [37598.994008] ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8 [37598.994008] ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200 [37598.994008] 0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4 [37598.994008] Call Trace: [37598.994008] [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0 [37598.994008] [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0 [37598.994008] [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40 [37598.994008] [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10 [37598.994008] [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90 [37598.994008] [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0 [37598.994008] [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30 [37598.994008] [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0 [37598.994008] [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0 [37598.994008] [<ffffffff8153d97d>] sys_sendto+0x12d/0x180 [37598.994008] [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0 [37598.994008] [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240 [37598.994008] [<ffffffff8166a7e7>] tracesys+0xdd/0xe2 [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 [37598.994008] RIP [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP <ffff88003670da18> [37599.007323] ---[ end trace d69f6a17f8ac8eee ]--- While there, also check if path mtu discovery is activated for this socket. The logic was adapted from ip6_append_data when first writing on the corked socket. This bug was introduced with commit 0c1833797a5a6ec23ea9261d979aa18078720b74 ("ipv6: fix incorrect ipsec fragment"). v2: a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE. b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao feng, thanks!). c) Change mtu to unsigned int, else we get a warning about non-matching types because of the min()-macro type-check. Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa
pending data [ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ] We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipip: fix a regression in ioctlCong Wang
[ Upstream commit 3b7b514f44bff05d26a6499c4d4fac2a83938e6e ] This is a regression introduced by commit fd58156e456d9f68fe0448 (IPIP: Use ip-tunneling code.) Similar to GRE tunnel, previously we only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the check is moved for all commands. So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL. Also, the check for i_key, o_key etc. is suspicious too, which did not exist before, reset them before passing to ip_tunnel_ioctl(). Signed-off-by: Cong Wang <amwang@redhat.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28l2tp: add missing .owner to struct pppox_protoWei Yongjun
[ Upstream commit e1558a93b61962710733dc8c11a2bc765607f1cd ] Add missing .owner of struct pppox_proto. This prevents the module from being removed from underneath its users. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ip_tunnels: Use skb-len to PMTU check.Pravin B Shelar
[ Upstream commit 23a3647bc4f93bac3776c66dc2c7f7f68b3cd662 ] In path mtu check, ip header total length works for gre device but not for gre-tap device. Use skb len which is consistent for all tunneling types. This is old bug in gre. This also fixes mtu calculation bug introduced by commit c54419321455631079c7d (GRE: Refactor GRE tunneling code). Reported-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6,mcast: always hold idev->lock before mca_lockAmerigo Wang
[ Upstream commit 8965779d2c0e6ab246c82a405236b1fb2adae6b2, with some bits from commit b7b1bfce0bb68bd8f6e62a28295922785cc63781 ("ipv6: split duplicate address detection and router solicitation timer") to get the __ipv6_get_lladdr() used by this patch. ] dingtianhong reported the following deadlock detected by lockdep: ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.24.05-0.1-default #1 Not tainted ------------------------------------------------------- ksoftirqd/0/3 is trying to acquire lock: (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 but task is already holding lock: (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mc->mca_lock){+.+...}: [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60 [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120 [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60 [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80 [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0 [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80 [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20 [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60 [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80 [<ffffffff813d9360>] dev_change_flags+0x40/0x70 [<ffffffff813ea627>] do_setlink+0x237/0x8a0 [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600 [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310 [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0 [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40 [<ffffffff81403e20>] netlink_unicast+0x140/0x180 [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380 [<ffffffff813c4252>] sock_sendmsg+0x112/0x130 [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460 [<ffffffff813c5544>] sys_sendmsg+0x44/0x70 [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b -> #0 (&ndev->lock){+.+...}: [<ffffffff810a798e>] check_prev_add+0x3de/0x440 [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f6c82>] rt_read_lock+0x42/0x60 [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 [<ffffffff8149b036>] mld_newpack+0xb6/0x160 [<ffffffff8149b18b>] add_grhead+0xab/0xc0 [<ffffffff8149d03b>] add_grec+0x3ab/0x460 [<ffffffff8149d14a>] mld_send_report+0x5a/0x150 [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0 [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0 [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0 [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0 [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0 [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210 [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0 [<ffffffff8106c7de>] kthread+0xae/0xc0 [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10 actually we can just hold idev->lock before taking pmc->mca_lock, and avoid taking idev->lock again when iterating idev->addr_list, since the upper callers of mld_newpack() already take read_lock_bh(&idev->lock). Reported-by: dingtianhong <dingtianhong@huawei.com> Cc: dingtianhong <dingtianhong@huawei.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Chen Weilong <chenweilong@huawei.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28vti: remove duplicated code to fix a memory leakCong Wang
[ Upstream commit ab6c7a0a43c2eaafa57583822b619b22637b49c7 ] vti module allocates dev->tstats twice: in vti_fb_tunnel_init() and in vti_tunnel_init(), this lead to a memory leak of dev->tstats. Just remove the duplicated operations in vti_fb_tunnel_init(). (candidate for -stable) Signed-off-by: Cong Wang <amwang@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Saurabh Mohan <saurabh.mohan@vyatta.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28gre: fix a regression in ioctlCong Wang
[ Upstream commit 6c734fb8592f6768170e48e7102cb2f0a1bb9759 ] When testing GRE tunnel, I got: # ip tunnel show get tunnel gre0 failed: Invalid argument get tunnel gre1 failed: Invalid argument This is a regression introduced by commit c54419321455631079c7d ("GRE: Refactor GRE tunneling code.") because previously we only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the check is moved for all commands. So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL. After this patch I got: # ip tunnel show gre0: gre/ip remote any local any ttl inherit nopmtudisc gre1: gre/ip remote 192.168.122.101 local 192.168.122.45 ttl inherit Signed-off-by: Cong Wang <amwang@redhat.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28net: Swap ver and type in pppoe_hdrChangli Gao
[ Upstream commit b1a5a34bd0b8767ea689e68f8ea513e9710b671e ] Ver and type in pppoe_hdr should be swapped as defined by RFC2516 section-4. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28x25: Fix broken locking in ioctl error paths.Dave Jones
[ Upstream commit 4ccb93ce7439b63c31bc7597bfffd13567fa483d ] Two of the x25 ioctl cases have error paths that break out of the function without unlocking the socket, leading to this warning: ================================================ [ BUG: lock held when returning to user space! ] 3.10.0-rc7+ #36 Not tainted ------------------------------------------------ trinity-child2/31407 is leaving the kernel with locks still held! 1 lock held by trinity-child2/31407: #0: (sk_lock-AF_X25){+.+.+.}, at: [<ffffffffa024b6da>] x25_ioctl+0x8a/0x740 [x25] Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28neighbour: fix a race in neigh_destroy()Eric Dumazet
[ Upstream commit c9ab4d85de222f3390c67aedc9c18a50e767531e ] There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: only apply anti-spoofing checks to not-pointopoint tunnelsHannes Frederic Sowa
[ Upstream commit 5c29fb12e8fb8a8105ea048cb160fd79a85a52bb ] Because of commit 218774dc341f219bfcf940304a081b121a0e8099 ("ipv6: add anti-spoofing checks for 6to4 and 6rd") the sit driver dropped packets for 2002::/16 destinations and sources even when configured to work as a tunnel with fixed endpoint. We may only apply the 6rd/6to4 anti-spoofing checks if the device is not in pointopoint mode. This was an oversight from me in the above commit, sorry. Thanks to Roman Mamedov for reporting this! Reported-by: Roman Mamedov <rm@romanrm.ru> Cc: David Miller <davem@davemloft.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28sparc32: vm_area_struct access for old Sun SPARCs.Olivier DANET
upstream commit 961246b4ed8da3bcf4ee1eb9147f341013553e3c. Commit e4c6bfd2d79d063017ab19a18915f0bc759f32d9 ("mm: rearrange vm_area_struct for fewer cache misses") changed the layout of the vm_area_struct structure, it broke several SPARC32 assembly routines which used numerical constants for accessing the vm_mm field. This patch defines the VMA_VM_MM constant to replace the immediate values. Signed-off-by: Olivier DANET <odanet@caramail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28writeback: Fix periodic writeback after fs mountJan Kara
commit a5faeaf9109578e65e1a32e2a3e76c8b47e7dcb6 upstream. Code in blkdev.c moves a device inode to default_backing_dev_info when the last reference to the device is put and moves the device inode back to its bdi when the first reference is acquired. This includes moving to wb.b_dirty list if the device inode is dirty. The code however doesn't setup timer to wake corresponding flusher thread and while wb.b_dirty list is non-empty __mark_inode_dirty() will not set it up either. Thus periodic writeback is effectively disabled until a sync(2) call which can lead to unexpected data loss in case of crash or power failure. Fix the problem by setting up a timer for periodic writeback in case we add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi(). Reported-by: Bert De Jonghe <Bert.DeJonghe@amplidata.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25Linux 3.10.3v3.10.3Greg Kroah-Hartman
2013-07-25tracing: Add trace_array_get/put() to event handlingSteven Rostedt (Red Hat)
commit 8e2e2fa47129532a30cff6c25a47078dc97d9260 upstream. Commit a695cb58162 "tracing: Prevent deleting instances when they are being read" tried to fix a race between deleting a trace instance and reading contents of a trace file. But it wasn't good enough. The following could crash the kernel: # cd /sys/kernel/debug/tracing/instances # ( while :; do mkdir foo; rmdir foo; done ) & # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) & Luckily this can only be done by root user, but it should be fixed regardless. The problem is that a delete of the file can happen after the write to the event is opened, but before the enabling happens. The solution is to make sure the trace_array is available before succeeding in opening for write, and incerment the ref counter while opened. Now the instance can be deleted when the events are writing to the buffer, but the deletion of the instance will disable all events before the instance is actually deleted. Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Fix race between deleting buffer and setting eventsSteven Rostedt (Red Hat)
commit 2a6c24afab70dbcfee49f4c76e1511eec1a3298b upstream. While analyzing the code, I discovered that there's a potential race between deleting a trace instance and setting events. There are a few races that can occur if events are being traced as the buffer is being deleted. Mostly the problem comes with freeing the descriptor used by the trace event callback. To prevent problems like this, the events are disabled before the buffer is deleted. The problem with the current solution is that the event_mutex is let go between disabling the events and freeing the files, which means that the events could be enabled again while the freeing takes place. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Get trace_array ref counts when accessing trace filesSteven Rostedt (Red Hat)
commit 7b85af63034818e43aee6c1d7bf1c7c6796a9073 upstream. When a trace file is opened that may access a trace array, it must increment its ref count to prevent it from being deleted. Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Add trace_array_get/put() to handle instance refs betterSteven Rostedt (Red Hat)
commit ff451961a8b2a17667a7bfa39c86fb9b351445db upstream. Commit a695cb58162 "tracing: Prevent deleting instances when they are being read" tried to fix a race between deleting a trace instance and reading contents of a trace file. But it wasn't good enough. The following could crash the kernel: # cd /sys/kernel/debug/tracing/instances # ( while :; do mkdir foo; rmdir foo; done ) & # ( while :; do cat foo/trace &> /dev/null; done ) & Luckily this can only be done by root user, but it should be fixed regardless. The problem is that a delete of the file can happen after the reader starts to open the file but before it grabs the trace_types_mutex. The solution is to validate the trace array before using it. If the trace array does not exist in the list of trace arrays, then it returns -ENODEV. There's a possibility that a trace_array could be deleted and a new one created and the open would open its file instead. But that is very minor as it will just return the data of the new trace array, it may confuse the user but it will not crash the system. As this can only be done by root anyway, the race will only occur if root is deleting what its trying to read at the same time. Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Protect ftrace_trace_arrays list in trace_events.cAlexander Z Lam
commit a82274151af2b075163e3c42c828529dee311487 upstream. There are multiple places where the ftrace_trace_arrays list is accessed in trace_events.c without the trace_types_lock held. Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.com Signed-off-by: Alexander Z Lam <azl@google.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Make trace_marker use the correct per-instance bufferAlexander Z Lam
commit 2d71619c59fac95a5415a326162fa046161b938c upstream. The trace_marker file was present for each new instance created, but it added the trace mark to the global trace buffer instead of to the instance's buffer. Link: http://lkml.kernel.org/r/1372717885-4543-2-git-send-email-azl@google.com Signed-off-by: Alexander Z Lam <azl@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Fix irqs-off tag display in syscall tracingzhangwei(Jovi)
commit 11034ae9c20f4057a6127fc965906417978e69b2 upstream. All syscall tracing irqs-off tags are wrong, the syscall enter entry doesn't disable irqs. [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event [root@jovi tracing]# cat trace # tracer: nop # # entries-in-buffer/entries-written: 13/13 #P:2 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | irqbalance-513 [000] d... 56115.496766: sys_open(filename: 804e1a6, flags: 0, mode: 1b6) irqbalance-513 [000] d... 56115.497008: sys_open(filename: 804e1bb, flags: 0, mode: 1b6) sendmail-771 [000] d... 56115.827982: sys_open(filename: b770e6d1, flags: 0, mode: 1b6) The reason is syscall tracing doesn't record irq_flags into buffer. The proper display is: [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event [root@jovi tracing]# cat trace # tracer: nop # # entries-in-buffer/entries-written: 14/14 #P:2 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | irqbalance-514 [001] .... 46.213921: sys_open(filename: 804e1a6, flags: 0, mode: 1b6) irqbalance-514 [001] .... 46.214160: sys_open(filename: 804e1bb, flags: 0, mode: 1b6) <...>-920 [001] .... 47.307260: sys_open(filename: 4e82a0c5, flags: 80000, mode: 0) Link: http://lkml.kernel.org/r/1365564393-10972-3-git-send-email-jovi.zhangwei@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-25tracing: Failed to create system directorySteven Rostedt
commit 6e94a780374ed31b280f939d4757e8d7858dff16 upstream. Running the following: # cd /sys/kernel/debug/tracing # echo p:i do_sys_open > kprobe_events # echo p:j schedule >> kprobe_events # cat kprobe_events p:kprobes/i do_sys_open p:kprobes/j schedule # echo p:i do_sys_open >> kprobe_events # cat kprobe_events p:kprobes/j schedule p:kprobes/i do_sys_open # ls /sys/kernel/debug/tracing/events/kprobes/ enable filter j Notice that the 'i' is missing from the kprobes directory. The console produces: "Failed to create system directory kprobes" This is because kprobes passes in a allocated name for the system and the ftrace event subsystem saves off that name instead of creating a duplicate for it. But the kprobes may free the system name making the pointer to it invalid. This bug was introduced by 92edca073c37 "tracing: Use direct field, type and system names" which switched from using kstrdup() on the system name in favor of just keeping apointer to it, as the internal ftrace event system names are static and exist for the life of the computer being booted. Instead of reverting back to duplicating system names again, we can use core_kernel_data() to determine if the passed in name was allocated or static. Then use the MSB of the ref_count to be a flag to keep track if the name was allocated or not. Then we can still save from having to duplicate strings that will always exist, but still copy the ones that may be freed. Reported-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>