<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/x86/kernel/setup.c, branch v3.9.4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low</title>
<updated>2013-04-17T19:35:33+00:00</updated>
<author>
<name>Yinghai Lu</name>
<email>yinghai@kernel.org</email>
</author>
<published>2013-04-16T05:23:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=adbc742bf78695bb98c79d18c558b61571748b99'/>
<id>adbc742bf78695bb98c79d18c558b61571748b99</id>
<content type='text'>
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.

-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
	description in kernel-parameters.txt
     still keep crashkernel=X override any crashkernel=X,high
        crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
     checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
     also separate parse_suffix from parse_simper according to vivek.
	so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
     found by HTYAYAMA.

Cc: HATAYAMA Daisuke &lt;d.hatayama@jp.fujitsu.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.

-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
	description in kernel-parameters.txt
     still keep crashkernel=X override any crashkernel=X,high
        crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
     checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
     also separate parse_suffix from parse_simper according to vivek.
	so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
     found by HTYAYAMA.

Cc: HATAYAMA Daisuke &lt;d.hatayama@jp.fujitsu.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, kdump: Retore crashkernel= to allocate under 896M</title>
<updated>2013-04-17T19:35:33+00:00</updated>
<author>
<name>Yinghai Lu</name>
<email>yinghai@kernel.org</email>
</author>
<published>2013-04-16T05:23:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=55a20ee7804ab64ac90bcdd4e2868a42829e2784'/>
<id>55a20ee7804ab64ac90bcdd4e2868a42829e2784</id>
<content type='text'>
Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

-v2: let crashkernel=X override crashkernel_high=
    update description about _high will be ignored by crashkernel=X
-v3: update description about kernel-parameters.txt according to Vivek.

Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

-v2: let crashkernel=X override crashkernel_high=
    update description about _high will be ignored by crashkernel=X
-v3: update description about kernel-parameters.txt according to Vivek.

Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, kdump: Set crashkernel_low automatically</title>
<updated>2013-04-17T19:35:32+00:00</updated>
<author>
<name>Yinghai Lu</name>
<email>yinghai@kernel.org</email>
</author>
<published>2013-04-16T05:23:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c729de8fcea37a1c444e81857eace12494c804a9'/>
<id>c729de8fcea37a1c444e81857eace12494c804a9</id>
<content type='text'>
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y.  This causes regression
if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
     also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.
-v6: Change description about swiotlb referring according to HATAYAMA.

Reported-by: WANG Chao &lt;chaowang@redhat.com&gt;
Tested-by: WANG Chao &lt;chaowang@redhat.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y.  This causes regression
if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
     also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.
-v6: Change description about swiotlb referring according to HATAYAMA.

Reported-by: WANG Chao &lt;chaowang@redhat.com&gt;
Tested-by: WANG Chao &lt;chaowang@redhat.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@linux.intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: Fix 32-bit *_cpu_data initializers</title>
<updated>2013-03-07T04:15:50+00:00</updated>
<author>
<name>Krzysztof Mazur</name>
<email>krzysiek@podlesie.net</email>
</author>
<published>2013-03-02T23:14:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=015221fefbc93689dd47508a66326556adf2abcd'/>
<id>015221fefbc93689dd47508a66326556adf2abcd</id>
<content type='text'>
The commit 27be457000211a6903968dfce06d5f73f051a217
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.

If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.

To avoid such problems in future C99-style initialization is now
used.

Signed-off-by: Krzysztof Mazur &lt;krzysiek@podlesie.net&gt;
Acked-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commit 27be457000211a6903968dfce06d5f73f051a217
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.

If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.

To avoid such problems in future C99-style initialization is now
used.

Signed-off-by: Krzysztof Mazur &lt;krzysiek@podlesie.net&gt;
Acked-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, ACPI, mm: Revert movablemem_map support</title>
<updated>2013-03-02T17:34:39+00:00</updated>
<author>
<name>Yinghai Lu</name>
<email>yinghai@kernel.org</email>
</author>
<published>2013-03-01T22:51:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=20e6926dcbafa1b361f1c29d967688be14b6ca4b'/>
<id>20e6926dcbafa1b361f1c29d967688be14b6ca4b</id>
<content type='text'>
Tim found:

  WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
  Hardware name: S2600CP
  sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
  smpboot: Booting Node   1, Processors  #1
  Modules linked in:
  Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
  Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(&amp;numa_meminfo, 0, sizeof(numa_meminfo))
   can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
   and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
   a. that early_parse_srat is NOT called for ia64, so you break ia64.
   b.  for (i = 0; i &lt; MAX_LOCAL_APIC; i++)
	     set_apicid_to_node(i, NUMA_NO_NODE)
     still left in numa_init. So it will just clear result from early_parse_srat.
     it should be moved before that....
   c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
       early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
   but it changes critical x86 code. It caused x86 guys did not
   pay attention to find the problem early. Those patches really should
   be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
  a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
  b. initrd... it will be freed after booting, so it could be on movable...
  c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
	anymore.
  d. init_mem_mapping: can not put page table high anymore.
  e. initmem_init: vmemmap can not be high local node anymore. That is
     not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

 f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

 01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

 27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

 e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

 fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")

 42f47e27e761 ("page_alloc: make movablemem_map have higher priority")

 6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

 34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")

 4d59a75125d5 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0.  Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner &lt;tim.gardner@canonical.com&gt;
Reported-by: Don Morris &lt;don.morris@hp.com&gt;
Bisected-by: Don Morris &lt;don.morris@hp.com&gt;
Tested-by: Don Morris &lt;don.morris@hp.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Thomas Renninger &lt;trenn@suse.de&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Tim found:

  WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
  Hardware name: S2600CP
  sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
  smpboot: Booting Node   1, Processors  #1
  Modules linked in:
  Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
  Call Trace:
    set_cpu_sibling_map+0x279/0x449
    start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(&amp;numa_meminfo, 0, sizeof(numa_meminfo))
   can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
   and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
   a. that early_parse_srat is NOT called for ia64, so you break ia64.
   b.  for (i = 0; i &lt; MAX_LOCAL_APIC; i++)
	     set_apicid_to_node(i, NUMA_NO_NODE)
     still left in numa_init. So it will just clear result from early_parse_srat.
     it should be moved before that....
   c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
       early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
   but it changes critical x86 code. It caused x86 guys did not
   pay attention to find the problem early. Those patches really should
   be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
  a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
  b. initrd... it will be freed after booting, so it could be on movable...
  c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
	anymore.
  d. init_mem_mapping: can not put page table high anymore.
  e. initmem_init: vmemmap can not be high local node anymore. That is
     not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

 f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
    protect movablecore_map in memblock_overlaps_region().")

 01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
    SRAT")

 27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
    the end of node")

 e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
    ready")

 fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")

 42f47e27e761 ("page_alloc: make movablemem_map have higher priority")

 6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
    movable limit for nodes")

 34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")

 4d59a75125d5 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0.  Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner &lt;tim.gardner@canonical.com&gt;
Reported-by: Don Morris &lt;don.morris@hp.com&gt;
Bisected-by: Don Morris &lt;don.morris@hp.com&gt;
Tested-by: Don Morris &lt;don.morris@hp.com&gt;
Signed-off-by: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Thomas Renninger &lt;trenn@suse.de&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-02-28T00:17:42+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-28T00:17:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e3c4877de8b9d93bd47b6ee88eb594b1c1e10da5'/>
<id>e3c4877de8b9d93bd47b6ee88eb594b1c1e10da5</id>
<content type='text'>
Pull x86/EFI changes from Peter Anvin:

 - Improve the initrd handling in the EFI boot stub by allowing forward
   slashes in the pathname - from Chun-Yi Lee.

 - Cleanup code duplication in the EFI mixed kernel/firmware code - from
   Satoru Takeuchi.

 - efivarfs bug fixes for more strict filename validation, with lots of
   input from Al Viro.

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: remove duplicate code in setup_arch() by using, efi_is_native()
  efivarfs: guid part of filenames are case-insensitive
  efivarfs: Validate filenames much more aggressively
  efivarfs: Use sizeof() instead of magic number
  x86, efi: Allow slash in file path of initrd
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86/EFI changes from Peter Anvin:

 - Improve the initrd handling in the EFI boot stub by allowing forward
   slashes in the pathname - from Chun-Yi Lee.

 - Cleanup code duplication in the EFI mixed kernel/firmware code - from
   Satoru Takeuchi.

 - efivarfs bug fixes for more strict filename validation, with lots of
   input from Al Viro.

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: remove duplicate code in setup_arch() by using, efi_is_native()
  efivarfs: guid part of filenames are case-insensitive
  efivarfs: Validate filenames much more aggressively
  efivarfs: Use sizeof() instead of magic number
  x86, efi: Allow slash in file path of initrd
</pre>
</div>
</content>
</entry>
<entry>
<title>acpi, memory-hotplug: parse SRAT before memblock is ready</title>
<updated>2013-02-24T01:50:14+00:00</updated>
<author>
<name>Tang Chen</name>
<email>tangchen@cn.fujitsu.com</email>
</author>
<published>2013-02-23T00:33:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e8d1955258091e4c92d5a975ebd7fd8a98f5d30f'/>
<id>e8d1955258091e4c92d5a975ebd7fd8a98f5d30f</id>
<content type='text'>
On linux, the pages used by kernel could not be migrated.  As a result,
if a memory range is used by kernel, it cannot be hot-removed.  So if we
want to hot-remove memory, we should prevent kernel from using it.

The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.

But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel.  And before we parse SRAT, and know the
node memory ranges, memblock is working.  And it may allocate memory in
ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
and never be freed.

So, let's parse SRAT before memblock is called first.  And it is early
enough.

The first call of memblock_find_in_range_node() is in:

  setup_arch()
    |--&gt;setup_real_mode()

so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.

NOTE:

1) early_parse_srat() is called before numa_init(), and has initialized
   numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
   NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
   numa info.

2) I don't know why using count of memory affinities parsed from SRAT
   as a return value in original acpi_numa_init().  So I add a static
   variable srat_mem_cnt to remember this count and use it as the return
   value of the new acpi_numa_init()

[mhocko@suse.cz: parse SRAT before memblock is ready fix]
Signed-off-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Reviewed-by: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Cc: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Wu Jianguo &lt;wujianguo@huawei.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: "Brown, Len" &lt;len.brown@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On linux, the pages used by kernel could not be migrated.  As a result,
if a memory range is used by kernel, it cannot be hot-removed.  So if we
want to hot-remove memory, we should prevent kernel from using it.

The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.

But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel.  And before we parse SRAT, and know the
node memory ranges, memblock is working.  And it may allocate memory in
ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
and never be freed.

So, let's parse SRAT before memblock is called first.  And it is early
enough.

The first call of memblock_find_in_range_node() is in:

  setup_arch()
    |--&gt;setup_real_mode()

so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.

NOTE:

1) early_parse_srat() is called before numa_init(), and has initialized
   numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
   NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
   numa info.

2) I don't know why using count of memory affinities parsed from SRAT
   as a return value in original acpi_numa_init().  So I add a static
   variable srat_mem_cnt to remember this count and use it as the return
   value of the new acpi_numa_init()

[mhocko@suse.cz: parse SRAT before memblock is ready fix]
Signed-off-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Reviewed-by: Wen Congyang &lt;wency@cn.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Jiang Liu &lt;jiang.liu@huawei.com&gt;
Cc: Jianguo Wu &lt;wujianguo@huawei.com&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Wu Jianguo &lt;wujianguo@huawei.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: "Brown, Len" &lt;len.brown@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-02-22T02:06:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-22T02:06:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2ef14f465b9e096531343f5b734cffc5f759f4a6'/>
<id>2ef14f465b9e096531343f5b734cffc5f759f4a6</id>
<content type='text'>
Pull x86 mm changes from Peter Anvin:
 "This is a huge set of several partly interrelated (and concurrently
  developed) changes, which is why the branch history is messier than
  one would like.

  The *really* big items are two humonguous patchsets mostly developed
  by Yinghai Lu at my request, which completely revamps the way we
  create initial page tables.  In particular, rather than estimating how
  much memory we will need for page tables and then build them into that
  memory -- a calculation that has shown to be incredibly fragile -- we
  now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
  a #PF handler which creates temporary page tables on demand.

  This has several advantages:

  1. It makes it much easier to support things that need access to data
     very early (a followon patchset uses this to load microcode way
     early in the kernel startup).

  2. It allows the kernel and all the kernel data objects to be invoked
     from above the 4 GB limit.  This allows kdump to work on very large
     systems.

  3. It greatly reduces the difference between Xen and native (Xen's
     equivalent of the #PF handler are the temporary page tables created
     by the domain builder), eliminating a bunch of fragile hooks.

  The patch series also gets us a bit closer to W^X.

  Additional work in this pull is the 64-bit get_user() work which you
  were also involved with, and a bunch of cleanups/speedups to
  __phys_addr()/__pa()."

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
  x86, mm: Move reserving low memory later in initialization
  x86, doc: Clarify the use of asm("%edx") in uaccess.h
  x86, mm: Redesign get_user with a __builtin_choose_expr hack
  x86: Be consistent with data size in getuser.S
  x86, mm: Use a bitfield to mask nuisance get_user() warnings
  x86/kvm: Fix compile warning in kvm_register_steal_time()
  x86-32: Add support for 64bit get_user()
  x86-32, mm: Remove reference to alloc_remap()
  x86-32, mm: Remove reference to resume_map_numa_kva()
  x86-32, mm: Rip out x86_32 NUMA remapping code
  x86/numa: Use __pa_nodebug() instead
  x86: Don't panic if can not alloc buffer for swiotlb
  mm: Add alloc_bootmem_low_pages_nopanic()
  x86, 64bit, mm: hibernate use generic mapping_init
  x86, 64bit, mm: Mark data/bss/brk to nx
  x86: Merge early kernel reserve for 32bit and 64bit
  x86: Add Crash kernel low reservation
  x86, kdump: Remove crashkernel range find limit for 64bit
  memblock: Add memblock_mem_size()
  x86, boot: Not need to check setup_header version for setup_data
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 mm changes from Peter Anvin:
 "This is a huge set of several partly interrelated (and concurrently
  developed) changes, which is why the branch history is messier than
  one would like.

  The *really* big items are two humonguous patchsets mostly developed
  by Yinghai Lu at my request, which completely revamps the way we
  create initial page tables.  In particular, rather than estimating how
  much memory we will need for page tables and then build them into that
  memory -- a calculation that has shown to be incredibly fragile -- we
  now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
  a #PF handler which creates temporary page tables on demand.

  This has several advantages:

  1. It makes it much easier to support things that need access to data
     very early (a followon patchset uses this to load microcode way
     early in the kernel startup).

  2. It allows the kernel and all the kernel data objects to be invoked
     from above the 4 GB limit.  This allows kdump to work on very large
     systems.

  3. It greatly reduces the difference between Xen and native (Xen's
     equivalent of the #PF handler are the temporary page tables created
     by the domain builder), eliminating a bunch of fragile hooks.

  The patch series also gets us a bit closer to W^X.

  Additional work in this pull is the 64-bit get_user() work which you
  were also involved with, and a bunch of cleanups/speedups to
  __phys_addr()/__pa()."

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
  x86, mm: Move reserving low memory later in initialization
  x86, doc: Clarify the use of asm("%edx") in uaccess.h
  x86, mm: Redesign get_user with a __builtin_choose_expr hack
  x86: Be consistent with data size in getuser.S
  x86, mm: Use a bitfield to mask nuisance get_user() warnings
  x86/kvm: Fix compile warning in kvm_register_steal_time()
  x86-32: Add support for 64bit get_user()
  x86-32, mm: Remove reference to alloc_remap()
  x86-32, mm: Remove reference to resume_map_numa_kva()
  x86-32, mm: Rip out x86_32 NUMA remapping code
  x86/numa: Use __pa_nodebug() instead
  x86: Don't panic if can not alloc buffer for swiotlb
  mm: Add alloc_bootmem_low_pages_nopanic()
  x86, 64bit, mm: hibernate use generic mapping_init
  x86, 64bit, mm: Mark data/bss/brk to nx
  x86: Merge early kernel reserve for 32bit and 64bit
  x86: Add Crash kernel low reservation
  x86, kdump: Remove crashkernel range find limit for 64bit
  memblock: Add memblock_mem_size()
  x86, boot: Not need to check setup_header version for setup_data
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'x86/mm2' into x86/mm</title>
<updated>2013-02-15T17:25:08+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@linux.intel.com</email>
</author>
<published>2013-02-15T17:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0da3e7f526fde7a6522a3038b7ce609fc50f6707'/>
<id>0da3e7f526fde7a6522a3038b7ce609fc50f6707</id>
<content type='text'>
x86/mm2 is testing out fine, but has developed conflicts with x86/mm
due to patches in adjacent code.  Merge them so we can drop x86/mm2
and have a unified branch.

Resolved Conflicts:
	arch/x86/kernel/setup.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
x86/mm2 is testing out fine, but has developed conflicts with x86/mm
due to patches in adjacent code.  Merge them so we can drop x86/mm2
and have a unified branch.

Resolved Conflicts:
	arch/x86/kernel/setup.c
</pre>
</div>
</content>
</entry>
<entry>
<title>x86, mm: Move reserving low memory later in initialization</title>
<updated>2013-02-14T23:21:25+00:00</updated>
<author>
<name>H. Peter Anvin</name>
<email>hpa@zytor.com</email>
</author>
<published>2013-02-14T22:02:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=95c9608478d639dcffc14ea47b31bff021a99ed1'/>
<id>95c9608478d639dcffc14ea47b31bff021a99ed1</id>
<content type='text'>
Move the reservation of low memory, except for the 4K which actually
does belong to the BIOS, later in the initialization; in particular,
after we have already reserved the trampoline.

The current code locates the trampoline as high as possible, so by
deferring the allocation we will still be able to reserve as much
memory as is possible.  This allows us to run with reservelow=640k
without getting a crash on system startup.

Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Link: http://lkml.kernel.org/n/tip-0y9dqmmsousf69wutxwl3kkf@git.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the reservation of low memory, except for the 4K which actually
does belong to the BIOS, later in the initialization; in particular,
after we have already reserved the trampoline.

The current code locates the trampoline as high as possible, so by
deferring the allocation we will still be able to reserve as much
memory as is possible.  This allows us to run with reservelow=640k
without getting a crash on system startup.

Signed-off-by: H. Peter Anvin &lt;hpa@zytor.com&gt;
Link: http://lkml.kernel.org/n/tip-0y9dqmmsousf69wutxwl3kkf@git.kernel.org
</pre>
</div>
</content>
</entry>
</feed>
