linux-toradex.git/include/linux, branch v3.2.49

perf: Fix perf mmap bugs

2013-07-27T04:34:32+00:00

commit 26cb63ad11e04047a64309362674bcbbd6a6f246 upstream.

Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma->vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

[Backporting for 3.4-stable.
VM_RESERVED flag was replaced with pair 'VM_DONTEXPAND | VM_DONTDUMP' in
314e51b9 since 3.7.0-rc1, and 314e51b9 comes from a big patchset, we didn't
backport the patchset, so I restored 'VM_DNOTEXPAND | VM_DONTDUMP' as before:
-       vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+       vma->vm_flags |= VM_DONTCOPY | VM_RESERVED;
 -- zliu]

Reported-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Cc: Al Viro 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Zhouping Liu 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings

nbd: correct disconnect behavior

2013-07-27T04:34:29+00:00

commit c378f70adbc1bbecd9e6db145019f14b2f688c7c upstream.

Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes).  This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.

This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect.  This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).

Signed-off-by: Paul Clements 
Acked-by: Rob Landley 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2: adjust device pointer name]
Signed-off-by: Ben Hutchings

cgroup: fix RCU accesses to task->cgroups

2013-07-27T04:34:20+00:00

commit 14611e51a57df10240817d8ada510842faf0ec51 upstream.

task->cgroups is a RCU pointer pointing to struct css_set.  A task
switches to a different css_set on cgroup migration but a css_set
doesn't change once created and its pointers to cgroup_subsys_states
aren't RCU protected.

task_subsys_state[_check]() is the macro to acquire css given a task
and subsys_id pair.  It RCU-dereferences task->cgroups->subsys[] not
task->cgroups, so the RCU pointer task->cgroups ends up being
dereferenced without read_barrier_depends() after it.  It's broken.

Fix it by introducing task_css_set[_check]() which does
RCU-dereference on task->cgroups.  task_subsys_state[_check]() is
reimplemented to directly dereference ->subsys[] of the css_set
returned from task_css_set[_check]().

This removes some of sparse RCU warnings in cgroup.

v2: Fixed unbalanced parenthsis and there's no need to use
    rcu_dereference_raw() when !CONFIG_PROVE_RCU.  Both spotted by Li.

Signed-off-by: Tejun Heo 
Reported-by: Fengguang Wu 
Acked-by: Li Zefan 
[bwh: Backported to 3.2:
 - Adjust context
 - Remove CONFIG_PROVE_RCU condition
 - s/lockdep_is_held(&cgroup_mutex)/cgroup_lock_is_held()/]
Signed-off-by: Ben Hutchings

futex: Take hugepages into account when generating futex_key

2013-07-27T04:34:18+00:00

commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi 
Reviewed-by: Jiang Biao 
Tested-by: Ma Chenggong 
Reviewed-by: 'Mel Gorman' 
Acked-by: 'Darren Hart' 
Cc: 'Peter Zijlstra' 
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

net: force a reload of first item in hlist_nulls_for_each_entry_rcu

2013-06-29T03:06:40+00:00

[ Upstream commit c87a124a5d5e8cf8e21c4363c3372bcaf53ea190 ]

Roman Gushchin discovered that udp4_lib_lookup2() was not reloading
first item in the rcu protected list, in case the loop was restarted.

This produced soft lockups as in https://lkml.org/lkml/2013/4/16/37

rcu_dereference(X)/ACCESS_ONCE(X) seem to not work as intended if X is
ptr->field :

In some cases, gcc caches the value or ptr->field in a register.

Use a barrier() to disallow such caching, as documented in
Documentation/atomic_ops.txt line 114

Thanks a lot to Roman for providing analysis and numerous patches.

Diagnosed-by: Roman Gushchin 
Signed-off-by: Eric Dumazet 
Reported-by: Boris Zhmurov 
Signed-off-by: Roman Gushchin 
Acked-by: Paul E. McKenney 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg

2013-06-29T03:06:39+00:00

[ Upstream commits 1be374a0518a288147c6a7398792583200a67261 and
  a7526eb5d06b0084ef12d7b168d008fcf516caab ]

MSG_CMSG_COMPAT is (AFAIK) not intended to be part of the API --
it's a hack that steals a bit to indicate to other networking code
that a compat entry was used.  So don't allow it from a non-compat
syscall.

This prevents an oops when running this code:

int main()
{
	int s;
	struct sockaddr_in addr;
	struct msghdr *hdr;

	char *highpage = mmap((void*)(TASK_SIZE_MAX - 4096), 4096,
	                      PROT_READ | PROT_WRITE,
	                      MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
	if (highpage == MAP_FAILED)
		err(1, "mmap");

	s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
	if (s == -1)
		err(1, "socket");

        addr.sin_family = AF_INET;
        addr.sin_port = htons(1);
        addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
	if (connect(s, (struct sockaddr*)&addr, sizeof(addr)) != 0)
		err(1, "connect");

	void *evil = highpage + 4096 - COMPAT_MSGHDR_SIZE;
	printf("Evil address is %p\n", evil);

	if (syscall(__NR_sendmmsg, s, evil, 1, MSG_CMSG_COMPAT) < 0)
		err(1, "sendmmsg");

	return 0;
}

Cc: David S. Miller 
Signed-off-by: Andy Lutomirski 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

mm: migration: add migrate_entry_wait_huge()

2013-06-19T01:17:01+00:00

commit 30dad30922ccc733cfdbfe232090cf674dc374dc upstream.

When we have a page fault for the address which is backed by a hugepage
under migration, the kernel can't wait correctly and do busy looping on
hugepage fault until the migration finishes.  As a result, users who try
to kick hugepage migration (via soft offlining, for example) occasionally
experience long delay or soft lockup.

This is because pte_offset_map_lock() can't get a correct migration entry
or a correct page table lock for hugepage.  This patch introduces
migration_entry_wait_huge() to solve this.

Signed-off-by: Naoya Horiguchi 
Reviewed-by: Rik van Riel 
Reviewed-by: Wanpeng Li 
Reviewed-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Andi Kleen 
Cc: KOSAKI Motohiro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

CPU hotplug: provide a generic helper to disable/enable CPU hotplug

2013-06-19T01:16:59+00:00

commit 16e53dbf10a2d7e228709a7286310e629ede5e45 upstream.

There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation.  Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.

Restructure the code and make it generic enough to be useful for other
usecases too.

Signed-off-by: Srivatsa S. Bhat 
Signed-off-by: Robin Holt 
Cc: H. Peter Anvin 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Russ Anderson 
Cc: Robin Holt 
Cc: Russell King 
Cc: Guan Xuetao 
Cc: Shawn Guo 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

net: Add net_ratelimited_function and net__ratelimited macros

2013-06-19T01:16:44+00:00

commit 3a3bfb61e64476ff1e4ac3122cb6dec9c79b795c upstream.

__ratelimit() can be considered an inverted bool test because
it returns true when not ratelimited.  Several tests in the
kernel tree use this __ratelimit() function incorrectly.

No net_ratelimit uses are incorrect currently though.

Most uses of net_ratelimit are to log something via printk or
pr_.

In order to minimize the uses of net_ratelimit, and to start
standardizing the code style used for __ratelimit() and net_ratelimit(),
add a net_ratelimited_function() macro and net__ratelimited()
logging macros similar to pr__ratelimited that use the global
net_ratelimit instead of a static per call site "struct ratelimit_state".

Signed-off-by: Joe Perches 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

macvlan: fix passthru mode race between dev removal and rx path

2013-05-30T13:35:14+00:00

[ Upstream commit 233c7df0821c4190e2d3f4be0f2ca0ab40a5ed8c, note
  that I had to add list_first_or_null_rcu to rculist.h in order
  to accomodate this fix. ]

Currently, if macvlan in passthru mode is created and data are rxed and
you remove this device, following panic happens:

NULL pointer dereference at 0000000000000198
IP: [] macvlan_handle_frame+0x153/0x1f7 [macvlan]

I'm using following script to trigger this:


I run this script while "ping -f" is running on another machine to send
packets to e1 rx.

Reason of the panic is that list_first_entry() is blindly called in
macvlan_handle_frame() even if the list was empty. vlan is set to
incorrect pointer which leads to the crash.

I'm fixing this by protecting port->vlans list by rcu and by preventing
from getting incorrect pointer in case the list is empty.

Introduced by: commit eb06acdc85585f2 "macvlan: Introduce 'passthru' mode to takeover the underlying device"

Signed-off-by: Jiri Pirko 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings