linux-toradex.git/fs/proc, branch v3.14.3

fs/proc/base.c: fix GPF in /proc/$PID/map_files

2014-03-11T00:26:20+00:00

The expected logic of proc_map_files_get_link() is either to return 0
and initialize 'path' or return an error and leave 'path' uninitialized.

By the time dname_to_vma_addr() returns 0 the corresponding vma may have
already be gone.  In this case the path is not initialized but the
return value is still 0.  This results in 'general protection fault'
inside d_path().

Steps to reproduce:

  CONFIG_CHECKPOINT_RESTORE=y

    fd = open(...);
    while (1) {
        mmap(fd, ...);
        munmap(fd, ...);
    }

  ls -la /proc/$PID/map_files

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991

Signed-off-by: Artem Fetishev 
Signed-off-by: Aleksandr Terekhov 
Reported-by: 
Acked-by: Pavel Emelyanov 
Acked-by: Cyrill Gorcunov 
Reviewed-by: "Eric W. Biederman" 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: close PageTail race

2014-03-04T15:55:47+00:00

Commit bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page).  Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation.  This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling.  The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes 
Cc: Holger Kiehl 
Cc: Christoph Lameter 
Cc: Rafael Aquini 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Andrea Arcangeli 
Cc: Rik van Riel 
Cc: "Kirill A. Shutemov" 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-11T00:01:40+00:00

Currently, update_note_header_size_elf64() and
update_note_header_size_elf32() will add the size of a PT_NOTE entry to
real_sz even if that causes real_sz to exceeds max_sz.  This patch
corrects the while loop logic in those routines to ensure that does not
happen and prints a warning if a PT_NOTE entry is dropped.  If zero
PT_NOTE entries are found or this condition is encountered because the
only entry was dropped, a warning is printed and an error is returned.

One possible negative side effect of exceeding the max_sz limit is an
allocation failure in merge_note_headers_elf64() or
merge_note_headers_elf32() which would produce console output such as
the following while booting the crash kernel.

  vmalloc: allocation failure: 14076997632 bytes
  swapper/0: page allocation failure: order:0, mode:0x80d2
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7
  Call Trace:
    dump_stack+0x19/0x1b
    warn_alloc_failed+0xf0/0x160
    __vmalloc_node_range+0x19e/0x250
    vmalloc_user+0x4c/0x70
    merge_note_headers_elf64.constprop.9+0x116/0x24a
    vmcore_init+0x2d4/0x76c
    do_one_initcall+0xe2/0x190
    kernel_init_freeable+0x17c/0x207
    kernel_init+0xe/0x180
    ret_from_fork+0x7c/0xb0

  Kdump: vmcore not initialized

  kdump: dump target is /dev/sda4
  kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/
  kdump: saving vmcore-dmesg.txt
  Cannot open /proc/vmcore: No such file or directory
  kdump: saving vmcore-dmesg.txt failed
  kdump: saving vmcore
  kdump: saving vmcore failed

This type of failure has been seen on a four socket prototype system
with certain memory configurations.  Most PT_NOTE sections have a single
entry similar to:

  n_namesz = 0x5
  n_descsz = 0x150
  n_type   = 0x1

Occasionally, a second entry is encountered with very large n_namesz and
n_descsz sizes:

  n_namesz = 0x80000008
  n_descsz = 0x510ae163
  n_type   = 0x80000008

Not yet sure of the source of these extra entries, they seem bogus, but
they shouldn't cause crash dump to fail.

Signed-off-by: Greg Pearson 
Acked-by: Vivek Goyal 
Cc: HATAYAMA Daisuke 
Cc: Michael Holzheu 
Cc: "Eric W. Biederman" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs/proc/array.c: change do_task_stat() to use while_each_thread()

2014-01-24T00:37:02+00:00

Change the remaining next_thread (ab)users to use while_each_thread().

The last user which should be changed is next_tid(), but we can't do this
now.

__exit_signal() and complete_signal() are fine, they actually need
next_thread() logic.

This patch (of 3):

do_task_stat() can use while_each_thread(), no changes in
the compiled code.

Signed-off-by: Oleg Nesterov 
Cc: KOSAKI Motohiro 
Cc: Al Viro 
Cc: Kees Cook 
Reviewed-by: Sameer Nanda 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs/proc: don't use module_init for non-modular core code

2014-01-24T00:37:02+00:00

PROC_FS is a bool, so this code is either present or absent.  It will
never be modular, so using module_init as an alias for __initcall is
rather misleading.

Fix this up now, so that we can relocate module_init from init.h into
module.h in the future.  If we don't do this, we'd have to add module.h to
obviously non-modular code, and that would be ugly at best.

Note that direct use of __initcall is discouraged, vs.  one of the
priority categorized subgroups.  As __initcall gets mapped onto
device_initcall, our use of fs_initcall (which makes sense for fs code)
will thus change these registrations from level 6-device to level 5-fs
(i.e.  slightly earlier).  However no observable impact of that small
difference has been observed during testing, or is expected.

Also note that this change uncovers a missing semicolon bug in the
registration of vmcore_init as an initcall.

Signed-off-by: Paul Gortmaker 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs/proc/proc_devtree.c: remove empty /proc/device-tree when no openfirmware exists.

2014-01-24T00:37:01+00:00

Distribution kernels might want to build in support for /proc/device-tree
for kernels that might end up running on hardware that doesn't support
openfirmware.  This results in an empty /proc/device-tree existing.
Remove it if the OFW root node doesn't exist.

This situation actually confuses grub2, resulting in install failures.
grub2 sees the /proc/device-tree and picks the wrong install target cf.
http://bzr.savannah.gnu.org/lh/grub/trunk/grub/annotate/4300/util/grub-install.in#L311
grub should be more robust, but still, leaving an empty proc dir seems
pointless.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=818378.

Signed-off-by: Dave Jones 
Cc: Al Viro 
Cc: Paul Mackerras 
Cc: Josh Boyer 
Cc: Benjamin Herrenschmidt 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc: set attributes of pde using accessor functions

2014-01-24T00:37:01+00:00

Use existing accessors proc_set_user() and proc_set_size() to set
attributes.  Just a cleanup.

Signed-off-by: Rui Xiang 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc: fix ->f_pos overflows in first_tid()

2014-01-24T00:37:01+00:00

1. proc_task_readdir()->first_tid() path truncates f_pos to int, this
   is wrong even on 64bit.

   We could check that f_pos < PID_MAX or even INT_MAX in
   proc_task_readdir(), but this patch simply checks the potential
   overflow in first_tid(), this check is nop on 64bit.  We do not care if
   it was negative and the new unsigned value is huge, all we need to
   ensure is that we never wrongly return !NULL.

2. Remove the 2nd "nr != 0" check before get_nr_threads(),
   nr_threads == 0 is not distinguishable from !pid_task() above.

Signed-off-by: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Cc: Michal Hocko 
Cc: Sameer Nanda 
Cc: Sergey Dyasly 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc: don't (ab)use ->group_leader in proc_task_readdir() paths

2014-01-24T00:37:01+00:00

proc_task_readdir() does not really need "leader", first_tid() has to
revalidate it anyway.  Just pass proc_pid(inode) to first_tid() instead,
it can do pid_task(PIDTYPE_PID) itself and read ->group_leader only if
necessary.

The patch also extracts the "inode is dead" code from
pid_delete_dentry(dentry) into the new trivial helper,
proc_inode_is_dead(inode), proc_task_readdir() uses it to return -ENOENT
if this dir was removed.

This is a bit racy, but the race is very inlikely and the getdents() after
openndir() can see the empty "." + ".." dir only once.

Signed-off-by: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Cc: Michal Hocko 
Cc: Sameer Nanda 
Cc: Sergey Dyasly 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc: change first_tid() to use while_each_thread() rather than next_thread()

2014-01-24T00:37:01+00:00

Rerwrite the main loop to use while_each_thread() instead of
next_thread().  We are going to fix or replace while_each_thread(),
next_thread() should be avoided whenever possible.

Signed-off-by: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Cc: Michal Hocko 
Cc: Sameer Nanda 
Cc: Sergey Dyasly 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds