summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-11-11slub: Handle NULL parameter in kmem_cache_flagsChristoph Lameter
Andreas Herrmann writes: When I've used slub_debug kernel option (e.g. "slub_debug=,skbuff_fclone_cache" or similar) on a debug session I've seen a panic like: Highbank #setenv bootargs console=ttyAMA0 root=/dev/sda2 kgdboc.kgdboc=ttyAMA0,115200 slub_debug=,kmalloc-4096 earlyprintk=ttyAMA0 ... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 5 [#1] SMP ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 3.12.0-00048-gbe408cd #314 task: c0898360 ti: c088a000 task.ti: c088a000 PC is at strncmp+0x1c/0x84 LR is at kmem_cache_flags.isra.46.part.47+0x44/0x60 pc : [<c02c6da0>] lr : [<c0110a3c>] psr: 200001d3 sp : c088bea8 ip : c088beb8 fp : c088beb4 r10: 00000000 r9 : 413fc090 r8 : 00000001 r7 : 00000000 r6 : c2984a08 r5 : c0966e78 r4 : 00000000 r3 : 0000006b r2 : 0000000c r1 : 00000000 r0 : c2984a08 Flags: nzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 0000404a DAC: 00000015 Process swapper (pid: 0, stack limit = 0xc088a248) Stack: (0xc088bea8 to 0xc088c000) bea0: c088bed4 c088beb8 c0110a3c c02c6d90 c0966e78 00000040 bec0: ef001f00 00000040 c088bf14 c088bed8 c0112070 c0110a04 00000005 c010fac8 bee0: c088bf5c c088bef0 c010fac8 ef001f00 00000040 00000000 00000040 00000001 bf00: 413fc090 00000000 c088bf34 c088bf18 c0839190 c0112040 00000000 ef001f00 bf20: 00000000 00000000 c088bf54 c088bf38 c0839200 c083914c 00000006 c0961c4c bf40: c0961c28 00000000 c088bf7c c088bf58 c08392ac c08391c0 c08a2ed8 c0966e78 bf60: c086b874 c08a3f50 c0961c28 00000001 c088bfb4 c088bf80 c083b258 c0839248 bf80: 2f800000 0f000000 c08935b4 ffffffff c08cd400 ffffffff c08cd400 c0868408 bfa0: c29849c0 00000000 c088bff4 c088bfb8 c0824974 c083b1e4 ffffffff ffffffff bfc0: c08245c0 00000000 00000000 c0868408 00000000 10c5387d c0892bcc c0868404 bfe0: c0899440 0000406a 00000000 c088bff8 00008074 c0824824 00000000 00000000 [<c02c6da0>] (strncmp+0x1c/0x84) from [<c0110a3c>] (kmem_cache_flags.isra.46.part.47+0x44/0x60) [<c0110a3c>] (kmem_cache_flags.isra.46.part.47+0x44/0x60) from [<c0112070>] (__kmem_cache_create+0x3c/0x410) [<c0112070>] (__kmem_cache_create+0x3c/0x410) from [<c0839190>] (create_boot_cache+0x50/0x74) [<c0839190>] (create_boot_cache+0x50/0x74) from [<c0839200>] (create_kmalloc_cache+0x4c/0x88) [<c0839200>] (create_kmalloc_cache+0x4c/0x88) from [<c08392ac>] (create_kmalloc_caches+0x70/0x114) [<c08392ac>] (create_kmalloc_caches+0x70/0x114) from [<c083b258>] (kmem_cache_init+0x80/0xe0) [<c083b258>] (kmem_cache_init+0x80/0xe0) from [<c0824974>] (start_kernel+0x15c/0x318) [<c0824974>] (start_kernel+0x15c/0x318) from [<00008074>] (0x8074) Code: e3520000 01a00002 089da800 e5d03000 (e5d1c000) ---[ end trace 1b75b31a2719ed1d ]--- Kernel panic - not syncing: Fatal exception Problem is that slub_debug option is not parsed before create_boot_cache is called. Solve this by changing slub_debug to early_param. Kernels 3.11, 3.10 are also affected. I am not sure about older kernels. Christoph Lameter explains: kmem_cache_flags may be called with NULL parameter during early boot. Skip the test in that case. Cc: stable@vger.kernel.org # 3.10 and 3.11 Reported-by: Andreas Herrmann <andreas.herrmann@calxeda.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-11-11Merge branch 'slab/struct-page' into slab/nextPekka Enberg
2013-10-30slab: replace non-existing 'struct freelist *' with 'void *'Joonsoo Kim
There is no 'strcut freelist', but codes use pointer to 'struct freelist'. Although compiler doesn't complain anything about this wrong usage and codes work fine, but fixing it is better. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-30slab: fix to calm down kmemleak warningJoonsoo Kim
After using struct page as slab management, we should not call kmemleak_scan_area(), since struct page isn't the tracking object of kmemleak. Without this patch and if CONFIG_DEBUG_KMEMLEAK is enabled, so many kmemleak warnings are printed. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slub: proper kmemleak tracking if CONFIG_SLUB_DEBUG disabledRoman Bobniev
Move all kmemleak calls into hook functions, and make it so that all hooks (both inside and outside of #ifdef CONFIG_SLUB_DEBUG) call the appropriate kmemleak routines. This allows for kmemleak to be configured independently of slub debug features. It also fixes a bug where kmemleak was only partially enabled in some configurations. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Roman Bobniev <Roman.Bobniev@sonymobile.com> Signed-off-by: Tim Bird <tim.bird@sonymobile.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: rename slab_bufctl to slab_freelistJoonsoo Kim
Now, bufctl is not proper name to this array. So change it. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: remove useless statement for checking pfmemallocJoonsoo Kim
Now, virt_to_page(page->s_mem) is same as the page, because slab use this structure for management. So remove useless statement. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: use struct page for slab managementJoonsoo Kim
Now, there are a few field in struct slab, so we can overload these over struct page. This will save some memory and reduce cache footprint. After this change, slabp_cache and slab_size no longer related to a struct slab, so rename them as freelist_cache and freelist_size. These changes are just mechanical ones and there is no functional change. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: replace free and inuse in struct slab with newly introduced activeJoonsoo Kim
Now, free in struct slab is same meaning as inuse. So, remove both and replace them with active. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: remove SLAB_LIMITJoonsoo Kim
It's useless now, so remove it. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: remove kmem_bufctl_tJoonsoo Kim
Now, we changed the management method of free objects of the slab and there is no need to use special value, BUFCTL_END, BUFCTL_FREE and BUFCTL_ACTIVE. So remove them. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: change the management method of free objects of the slabJoonsoo Kim
Current free objects management method of the slab is weird, because it touch random position of the array of kmem_bufctl_t when we try to get free object. See following example. struct slab's free = 6 kmem_bufctl_t array: 1 END 5 7 0 4 3 2 To get free objects, we access this array with following pattern. 6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END If we have many objects, this array would be larger and be not in the same cache line. It is not good for performance. We can do same thing through more easy way, like as the stack. Only thing we have to do is to maintain stack top to free object. I use free field of struct slab for this purpose. After that, if we need to get an object, we can get it at stack top and manipulate top pointer. That's all. This method already used in array_cache management. Following is an access pattern when we use this method. struct slab's free = 0 kmem_bufctl_t array: 6 3 7 2 5 4 0 1 To get free objects, we access this array with following pattern. 0 -> 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 This may help cache line footprint if slab has many objects, and, in addition, this makes code much much simpler. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: use __GFP_COMP flag for allocating slab pagesJoonsoo Kim
If we use 'struct page' of first page as 'struct slab', there is no advantage not to use __GFP_COMP. So use __GFP_COMP flag for all the cases. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: use well-defined macro, virt_to_slab()Joonsoo Kim
This is trivial change, just use well-defined macro. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: overloading the RCU head over the LRU for RCU freeJoonsoo Kim
With build-time size checking, we can overload the RCU head over the LRU of struct page to free pages of a slab in rcu context. This really help to implement to overload the struct slab over the struct page and this eventually reduce memory usage and cache footprint of the SLAB. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: remove cachep in struct slab_rcuJoonsoo Kim
We can get cachep using page in struct slab_rcu, so remove it. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: remove nodeid in struct slabJoonsoo Kim
We can get nodeid using address translation, so this field is not useful. Therefore, remove it. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: remove colouroff in struct slabJoonsoo Kim
Now there is no user colouroff, so remove it. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: change return type of kmem_getpages() to struct pageJoonsoo Kim
It is more understandable that kmem_getpages() return struct page. And, with this, we can reduce one translation from virt addr to page and makes better code than before. Below is a change of this patch. * Before text data bss dec hex filename 22123 23434 4 45561 b1f9 mm/slab.o * After text data bss dec hex filename 22074 23434 4 45512 b1c8 mm/slab.o And this help following patch to remove struct slab's colouroff. Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-10-24slab: correct pfmemalloc checkJoonsoo Kim
We checked pfmemalloc by slab unit, not page unit. You can see this in is_slab_pfmemalloc(). So other pages don't need to be set/cleared pfmemalloc. And, therefore we should check pfmemalloc in page flag of first page, but current implementation don't do that. virt_to_head_page(obj) just return 'struct page' of that object, not one of first page, since the SLAB don't use __GFP_COMP when CONFIG_MMU. To get 'struct page' of first page, we first get a slab and try to get it via virt_to_head_page(slab->s_mem). Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@iki.fi>
2013-09-02Linux 3.11v3.11Linus Torvalds
2013-09-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "This is a bug fix for the pm80xx driver. It turns out that when the new hardware support was added in 3.10 the IO command size was kept at the old hard coded value. This means that the driver attaches to some new cards and then simply hangs the system" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] pm80xx: fix Adaptec 71605H hang
2013-09-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot fix from Peter Anvin: "A single very small boot fix for very large memory systems (> 0.5T)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
2013-09-02Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds
Pull slave-dma fix from Vinod Koul: "A fix for resolving TI_EDMA driver's build error in allmodconfig to have filter function built in"" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dma/Kconfig: TI_EDMA needs to be boolean
2013-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) There was a simplification in the ipv6 ndisc packet sending attempted here, which avoided using memory accounting on the per-netns ndisc socket for sending NDISC packets. It did fix some important issues, but it causes regressions so it gets reverted here too. Specifically, the problem with this change is that the IPV6 output path really depends upon there being a valid skb->sk attached. The reason we want to do this change in some form when we figure out how to do it right, is that if a device goes down the ndisc_sk socket send queue will fill up and block NDISC packets that we want to send to other devices too. That's really bad behavior. Hopefully Thomas can come up with a better version of this change. 2) Fix a severe TCP performance regression by reverting a change made to dev_pick_tx() quite some time ago. From Eric Dumazet. 3) TIPC returns wrongly signed error codes, fix from Erik Hugne. 4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the skb->sk too early. Fix from Li Hongjun. 5) RAW ipv4 sockets can use the wrong routing key during lookup, from Chris Clark. 6) Similar to #1 revert an older change that tried to use plain alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner mark which needs to see the skb->sk for such frames. From Phil Oester. 7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz, specifically in the handling of virtual functions. 8) IPSEC path error propagations to sockets is not done properly when we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic Sowa. 9) Fix missing channel context release in mac80211, from Johannes Berg. 10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy Lutomirski. 11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic drivers. From Michal Schmidt. 12) Hopefully a complete and correct fix for the genetlink dump locking and module reference counting. From Pravin B Shelar. 13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir. 14) Fix handling of timestamp offset when restoring a snapshotted TCP socket. From Andrew Vagin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) net: fec: fix time stamping logic after napi conversion net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay mISDN: return -EINVAL on error in dsp_control_req() net: revert 8728c544a9c ("net: dev_pick_tx() fix") Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" ipv4 tunnels: fix an oops when using ipip/sit with IPsec tipc: set sk_err correctly when connection fails tcp: tcp_make_synack() should use sock_wmalloc bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones ipv6: Don't depend on per socket memory for neighbour discovery messages ipv4: sendto/hdrincl: don't use destination address found in header tcp: don't apply tsoffset if rcv_tsecr is zero tcp: initialize rcv_tstamp for restored sockets net: xilinx: fix memleak net: usb: Add HP hs2434 device to ZLP exception table net: add cpu_relax to busy poll loop net: stmmac: fixed the pbl setting with DT genl: Hold reference on correct module while netlink-dump. genl: Fix genl dumpit() locking. xfrm: Fix potential null pointer dereference in xdst_queue_output ...
2013-08-30MAINTAINERS: change my DT related maintainer addressIan Campbell
Filtering capabilities on my work email are pretty much non-existent and this has turned out to be something of a firehose... Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Olof Johansson <olof@lixom.net> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Pawel Moll <pawel.moll@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-30Merge tag 'sound-3.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This contains two Oops fixes (opti9xx and HD-audio) and a simple fixup for an Acer laptop. All marked as stable patches" * tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: opti9xx: Fix conflicting driver object name ALSA: hda - Fix NULL dereference with CONFIG_SND_DYNAMIC_MINORS=n ALSA: hda - Add inverted digital mic fixup for Acer Aspire One
2013-08-30Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Two straggling fixes that I had missed as they were posted a couple of weeks ago, causing problems with interrupts (breaking them completely) on the CSR SiRF platforms" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm: prima2: drop nr_irqs in mach as we moved to linear irqdomain irqchip: sirf: move from legacy mode to linear irqdomain
2013-08-30Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Since we are getting to the pointy end, one i915 black screen on some machines, and one vmwgfx stop userspace ability to nuke the VM, There might be one or two ati or nouveau fixes trickle in before final, but I think this should pretty much be it" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/vmwgfx: Split GMR2_REMAP commands if they are to large drm/i915: ivb: fix edp voltage swing reg val
2013-08-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer updates from Dmitry Torokhov: "Just a couple of new IDs in Wacom and xpad drivers, i8042 is now disabled on ARC, and data checks in Elantech driver that were overly relaxed by the previous patch are now tightened" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: i8042 - disable the driver on ARC platforms Input: xpad - add signature for Razer Onza Classic Edition Input: elantech - fix packet check for v3 and v4 hardware Input: wacom - add support for 0x300 and 0x301
2013-08-30net: fec: fix time stamping logic after napi conversionRichard Cochran
Commit dc975382 "net: fec: add napi support to improve proformance" converted the fec driver to the napi model. However, that commit forgot to remove the call to skb_defer_rx_timestamp which is only needed in non-napi drivers. (The function napi_gro_receive eventually calls netif_receive_skb, which in turn calls skb_defer_rx_timestamp.) This patch should also be applied to the 3.9 and 3.10 kernels. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delayDaniel Borkmann
While looking into MLDv1/v2 code, I noticed that bridging code does not convert it's max delay into jiffies for MLDv2 messages as we do in core IPv6' multicast code. RFC3810, 5.1.3. Maximum Response Code says: The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: [...] As we update timers that work with jiffies, we need to convert it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Linus Lüssing <linus.luessing@web.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30mISDN: return -EINVAL on error in dsp_control_req()Dan Carpenter
If skb->len is too short then we should return an error. Otherwise we read beyond the end of skb->data for several bytes. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30net: revert 8728c544a9c ("net: dev_pick_tx() fix")Eric Dumazet
commit 8728c544a9cbdc ("net: dev_pick_tx() fix") and commit b6fe83e9525a ("bonding: refine IFF_XMIT_DST_RELEASE capability") are quite incompatible : Queue selection is disabled because skb dst was dropped before entering bonding device. This causes major performance regression, mainly because TCP packets for a given flow can be sent to multiple queues. This is particularly visible when using the new FQ packet scheduler with MQ + FQ setup on the slaves. We can safely revert the first commit now that 416186fbf8c5b ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx") properly caps the queue_index. Reported-by: Xi Wang <xii@google.com> Diagnosed-by: Xi Wang <xii@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30Revert "ipv6: Don't depend on per socket memory for neighbour discovery ↵David S. Miller
messages" This reverts commit 1f324e38870cc09659cf23bc626f1b8869e201f2. It seems to cause regressions, and in particular the output path really depends upon there being a socket attached to skb->sk for checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2(). Reported-by: Stephen Warren <swarren@wwwdotorg.org> Reported-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30ipv4 tunnels: fix an oops when using ipip/sit with IPsecLi Hongjun
Since commit 3d7b46cd20e3 (ip_tunnel: push generic protocol handling to ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on an IPv4 over IPv4 tunnel. xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb). Signed-off-by: Li Hongjun <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30tipc: set sk_err correctly when connection failsErik Hugne
Should a connect fail, if the publication/server is unavailable or due to some other error, a positive value will be returned and errno is never set. If the application code checks for an explicit zero return from connect (success) or a negative return (failure), it will not catch the error and subsequent send() calls will fail as shown from the strace snippet below. socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3 connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111 sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe) The reason for this behaviour is that TIPC wrongly inverts error codes set in sk_err. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30tcp: tcp_make_synack() should use sock_wmallocPhil Oester
In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so, the netfilter owner match lost its ability to block the SYNACK packet on outbound listening sockets. Revert the change, restoring the owner match functionality. This closes netfilter bugzilla #847. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 onesLinus Lüssing
Currently we would still potentially suffer multicast packet loss if there is just either an IGMP or an MLD querier: For the former case, we would possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is because we are currently assuming that if either an IGMP or MLD querier is present that the other one is present, too. This patch makes the behaviour and fix added in "bridge: disable snooping if there is no querier" (b00589af3b04) to also work if there is either just an IGMP or an MLD querier on the link: It refines the deactivation of the snooping to be protocol specific by using separate timers for the snooped IGMP and MLD queries as well as separate timers for our internal IGMP and MLD queriers. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29Merge branch 'for-3.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "During the percpu reference counting update which was merged during v3.11-rc1, the cgroup destruction path was updated so that a cgroup in the process of dying may linger on the children list, which was necessary as the cgroup should still be included in child/descendant iteration while percpu ref is being killed. Unfortunately, I forgot to update cgroup destruction path accordingly and cgroup destruction may fail spuriously with -EBUSY due to lingering dying children even when there's no live child left - e.g. "rmdir parent/child parent" will usually fail. This can be easily fixed by iterating through the children list to verify that there's no live child left. While this is very late in the release cycle, this bug is very visible to userland and I believe the fix is relatively safe. Thanks Hugh for spotting and providing fix for the issue" * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix rmdir EBUSY regression in 3.11
2013-08-29Merge branch 'for-3.11-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "This contains one fix which could lead to system-wide lockup on !PREEMPT kernels. It's very late in the cycle but this definitely is a -stable material. The problem is that workqueue worker tasks may process unlimited number of work items back-to-back without every yielding inbetween. This usually isn't noticeable but a work item which re-queues itself waiting for someone else to do something can deadlock with stop_machine. stop_machine will ensure nothing else happens on all other cpus and the requeueing work item will reqeueue itself indefinitely without ever yielding and thus preventing the CPU from entering stop_machine. Kudos to Jamie Liu for spotting and diagnosing the problem. This can be trivially fixed by adding cond_resched() after processing each work item" * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: cond_resched() after processing each work item
2013-08-29Merge tag 'nfs-for-3.11-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfix from Trond Myklebust: "Stable patch to fix a highmem-related data corruption issue on 32-bit ARM platforms" * tag 'nfs-for-3.11-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix memory corruption issue on 32-bit highmem systems
2013-08-30drm/vmwgfx: Split GMR2_REMAP commands if they are to largeJakob Bornecrantz
This fixes the piglit test texturing/max-texture-size causing the VM to die due to a too large SVGA command. Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Reviewed-by: Biran Paul <brianp@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@gmail.com>
2013-08-30Merge tag 'drm-intel-fixes-2013-08-30' of ↵Dave Airlie
git://people.freedesktop.org/~danvet/drm-intel into drm-fixes Just a one-line patch to fix a black screen issue on rare ivb machines, cc: stable. Normally I'd just shovel this into the -next pull request this late in the -rc cycle, but Linus was making noises about not getting real fixes which are cc: stable. So here we go ;-) * tag 'drm-intel-fixes-2013-08-30' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: ivb: fix edp voltage swing reg val
2013-08-30drm/i915: ivb: fix edp voltage swing reg valImre Deak
Fix the typo introduced in commit 1a2eb4604b85c5efb343da8a4dcf41288fcfca85 Author: Keith Packard <keithp@keithp.com> Date: Wed Nov 16 16:26:07 2011 -0800 drm/i915: Hook up Ivybridge eDP This fixes eDP link-training failures and cases where all voltage swing /pre-emphasis levels were tried and failed during clock recovery and - as a fallback - we go on to do channel equalization with the last voltage swing/pre-emphasis level which will succeed. Both issues can lead to a blank screen. v2: - improve commit message CC: stable@vger.kernel.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64880 Tested-by: Jeremy Moles <cubicool@gmail.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== This pull request fixes some issues that arise when 6in4 or 4in6 tunnels are used in combination with IPsec, all from Hannes Frederic Sowa and a null pointer dereference when queueing packets to the policy hold queue. 1) We might access the local error handler of the wrong address family if 6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer to the correct local_error to xfrm_state_afinet. 2) Add a helper function to always refer to the correct interpretation of skb->sk. 3) Call skb_reset_inner_headers to record the position of the inner headers when adding a new one in various ipv6 tunnels. This is needed to identify the addresses where to send back errors in the xfrm layer. 4) Dereference inner ipv6 header if encapsulated to always call the right error handler. 5) Choose protocol family by skb protocol to not call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is used in ipv4 mode. 6) Partly revert "xfrm: introduce helper for safe determination of mtu" because this introduced pmtu discovery problems. 7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs. We need this to get the correct mtu informations in xfrm. 8) Fix null pointer dereference in xdst_queue_output. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29ipv6: Don't depend on per socket memory for neighbour discovery messagesThomas Graf
Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, simply use alloc_skb() and don't depend on any socket wmem space any longer. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29ipv4: sendto/hdrincl: don't use destination address found in headerChris Clark
ipv4: raw_sendmsg: don't use header's destination address A sendto() regression was bisected and found to start with commit f8126f1d5136be1 (ipv4: Adjust semantics of rt->rt_gateway.) The problem is that it tries to ARP-lookup the constructed packet's destination address rather than the explicitly provided address. Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used. cf. commit 2ad5b9e4bd314fc685086b99e90e5de3bc59e26b Reported-by: Chris Clark <chris.clark@alcatel-lucent.com> Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com> Tested-by: Chris Clark <chris.clark@alcatel-lucent.com> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29tcp: don't apply tsoffset if rcv_tsecr is zeroAndrew Vagin
The zero value means that tsecr is not valid, so it's a special case. tsoffset is used to customize tcp_time_stamp for one socket. tsoffset is usually zero, it's used when a socket was moved from one host to another host. Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to TCP_RTO_MAX. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29tcp: initialize rcv_tstamp for restored socketsAndrew Vagin
u32 rcv_tstamp; /* timestamp of last received ACK */ Its value used in tcp_retransmit_timer, which closes socket if the last ack was received more then TCP_RTO_MAX ago. Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer is called before receiving a first ack, the connection is closed. This patch initializes rcv_tstamp to a timestamp, when a socket was restored. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>