<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/arch/microblaze/Kconfig, branch v4.10</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>microblaze/irqchip: Move intc driver to irqchip</title>
<updated>2016-11-29T09:14:49+00:00</updated>
<author>
<name>Zubair Lutfullah Kakakhel</name>
<email>Zubair.Kakakhel@imgtec.com</email>
</author>
<published>2016-11-14T12:13:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0547dc7885856c93b0c8bbf1f590ceccf0c87835'/>
<id>0547dc7885856c93b0c8bbf1f590ceccf0c87835</id>
<content type='text'>
The Xilinx AXI Interrupt Controller IP block is used by the MIPS
based xilfpga platform and a few PowerPC based platforms.

Move the interrupt controller code out of arch/microblaze so that
it can be used by everyone

Tested-by: Michal Simek &lt;michal.simek@xilinx.com&gt;
Signed-off-by: Zubair Lutfullah Kakakhel &lt;Zubair.Kakakhel@imgtec.com&gt;
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The Xilinx AXI Interrupt Controller IP block is used by the MIPS
based xilfpga platform and a few PowerPC based platforms.

Move the interrupt controller code out of arch/microblaze so that
it can be used by everyone

Tested-by: Michal Simek &lt;michal.simek@xilinx.com&gt;
Signed-off-by: Zubair Lutfullah Kakakhel &lt;Zubair.Kakakhel@imgtec.com&gt;
Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>microblaze: remove ARCH_WANT_OPTIONAL_GPIOLIB</title>
<updated>2016-06-08T07:56:46+00:00</updated>
<author>
<name>Linus Walleij</name>
<email>linus.walleij@linaro.org</email>
</author>
<published>2016-04-19T11:33:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0ee37414991e1be21e0f168172f0ac260e375886'/>
<id>0ee37414991e1be21e0f168172f0ac260e375886</id>
<content type='text'>
This symbols is not needed to get access to selecting the
GPIOLIB anymore: any arch can select GPIOLIB.

Cc: Michael Büsch &lt;m@bues.ch&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This symbols is not needed to get access to selecting the
GPIOLIB anymore: any arch can select GPIOLIB.

Cc: Michael Büsch &lt;m@bues.ch&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'hash' of git://ftp.sciencehorizons.net/linux</title>
<updated>2016-05-28T23:15:25+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-05-28T23:15:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7e0fb73c52c4037b4d5ef9ff56c7296a3151bd92'/>
<id>7e0fb73c52c4037b4d5ef9ff56c7296a3151bd92</id>
<content type='text'>
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in &lt;linux/sunrpc/svcauth.h&gt; to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca95 ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add &lt;asm/hash.h&gt;
  microblaze: Add &lt;asm/hash.h&gt;
  m68k: Add &lt;asm/hash.h&gt;
  &lt;linux/hash.h&gt;: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  &lt;linux/sunrpc/svcauth.h&gt;: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to &lt;linux/stringhash.h&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in &lt;linux/sunrpc/svcauth.h&gt; to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca95 ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add &lt;asm/hash.h&gt;
  microblaze: Add &lt;asm/hash.h&gt;
  m68k: Add &lt;asm/hash.h&gt;
  &lt;linux/hash.h&gt;: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  &lt;linux/sunrpc/svcauth.h&gt;: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to &lt;linux/stringhash.h&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>microblaze: Add &lt;asm/hash.h&gt;</title>
<updated>2016-05-28T19:48:58+00:00</updated>
<author>
<name>George Spelvin</name>
<email>linux@sciencehorizons.net</email>
</author>
<published>2016-05-25T15:06:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7b13277b682972c2ff8f6419e86c333d81936023'/>
<id>7b13277b682972c2ff8f6419e86c333d81936023</id>
<content type='text'>
Microblaze is an FPGA soft core that can be configured various ways.

If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.

Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.

Signed-off-by: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Alistair Francis &lt;alistair.francis@xilinx.com&gt;
Cc: Michal Simek &lt;michal.simek@xilinx.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Microblaze is an FPGA soft core that can be configured various ways.

If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.

Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.

Signed-off-by: George Spelvin &lt;linux@sciencehorizons.net&gt;
Cc: Alistair Francis &lt;alistair.francis@xilinx.com&gt;
Cc: Michal Simek &lt;michal.simek@xilinx.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/GCD.c: use binary GCD algorithm instead of Euclidean</title>
<updated>2016-05-21T00:58:30+00:00</updated>
<author>
<name>Zhaoxiu Zeng</name>
<email>zhaoxiu.zeng@gmail.com</email>
</author>
<published>2016-05-21T00:03:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fff7fb0b2d908dec779783d8eaf3d7725230f75e'/>
<id>fff7fb0b2d908dec779783d8eaf3d7725230f75e</id>
<content type='text'>
The binary GCD algorithm is based on the following facts:
	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available.  This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;stdint.h&gt;
	#include &lt;string.h&gt;
	#include &lt;time.h&gt;
	#include &lt;unistd.h&gt;

	#define swap(a, b) \
		do { \
			a ^= b; \
			b ^= a; \
			a ^= b; \
		} while (0)

	unsigned long gcd0(unsigned long a, unsigned long b)
	{
		unsigned long r;

		if (a &lt; b) {
			swap(a, b);
		}

		if (b == 0)
			return a;

		while ((r = a % b) != 0) {
			a = b;
			b = r;
		}

		return b;
	}

	unsigned long gcd1(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b &gt;&gt;= __builtin_ctzl(b);

		for (;;) {
			a &gt;&gt;= __builtin_ctzl(a);
			if (a == b)
				return a &lt;&lt; __builtin_ctzl(r);

			if (a &lt; b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd2(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &amp;= -r;

		while (!(b &amp; r))
			b &gt;&gt;= 1;

		for (;;) {
			while (!(a &amp; r))
				a &gt;&gt;= 1;
			if (a == b)
				return a;

			if (a &lt; b)
				swap(a, b);
			a -= b;
			a &gt;&gt;= 1;
			if (a &amp; r)
				a += b;
			a &gt;&gt;= 1;
		}
	}

	unsigned long gcd3(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b &gt;&gt;= __builtin_ctzl(b);
		if (b == 1)
			return r &amp; -r;

		for (;;) {
			a &gt;&gt;= __builtin_ctzl(a);
			if (a == 1)
				return r &amp; -r;
			if (a == b)
				return a &lt;&lt; __builtin_ctzl(r);

			if (a &lt; b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd4(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &amp;= -r;

		while (!(b &amp; r))
			b &gt;&gt;= 1;
		if (b == r)
			return r;

		for (;;) {
			while (!(a &amp; r))
				a &gt;&gt;= 1;
			if (a == r)
				return r;
			if (a == b)
				return a;

			if (a &lt; b)
				swap(a, b);
			a -= b;
			a &gt;&gt;= 1;
			if (a &amp; r)
				a += b;
			a &gt;&gt;= 1;
		}
	}

	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
		gcd0, gcd1, gcd2, gcd3, gcd4,
	};

	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

	#if defined(__x86_64__)

	#define rdtscll(val) do { \
		unsigned long __a,__d; \
		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)&lt;&lt;32); \
	} while(0)

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		unsigned long long start, end;
		unsigned long long ret;
		unsigned long gcd_res;

		rdtscll(start);
		gcd_res = gcd(a, b);
		rdtscll(end);

		if (end &gt;= start)
			ret = end - start;
		else
			ret = ~0ULL - start + 1 + end;

		*res = gcd_res;
		return ret;
	}

	#else

	static inline struct timespec read_time(void)
	{
		struct timespec time;
		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &amp;time);
		return time;
	}

	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
	{
		struct timespec temp;

		if ((end.tv_nsec - start.tv_nsec) &lt; 0) {
			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
		} else {
			temp.tv_sec = end.tv_sec - start.tv_sec;
			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
		}

		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
	}

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		struct timespec start, end;
		unsigned long gcd_res;

		start = read_time();
		gcd_res = gcd(a, b);
		end = read_time();

		*res = gcd_res;
		return diff_time(start, end);
	}

	#endif

	static inline unsigned long get_rand()
	{
		if (sizeof(long) == 8)
			return (unsigned long)rand() &lt;&lt; 32 | rand();
		else
			return rand();
	}

	int main(int argc, char **argv)
	{
		unsigned int seed = time(0);
		int loops = 100;
		int repeats = 1000;
		unsigned long (*res)[TEST_ENTRIES];
		unsigned long long elapsed[TEST_ENTRIES];
		int i, j, k;

		for (;;) {
			int opt = getopt(argc, argv, "n:r:s:");
			/* End condition always first */
			if (opt == -1)
				break;

			switch (opt) {
			case 'n':
				loops = atoi(optarg);
				break;
			case 'r':
				repeats = atoi(optarg);
				break;
			case 's':
				seed = strtoul(optarg, NULL, 10);
				break;
			default:
				/* You won't actually get here. */
				break;
			}
		}

		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
		memset(elapsed, 0, sizeof(elapsed));

		srand(seed);
		for (j = 0; j &lt; loops; j++) {
			unsigned long a = get_rand();
			/* Do we have args? */
			unsigned long b = argc &gt; optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			unsigned long long min_elapsed[TEST_ENTRIES];
			for (k = 0; k &lt; repeats; k++) {
				for (i = 0; i &lt; TEST_ENTRIES; i++) {
					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &amp;res[j][i]);
					if (k == 0 || min_elapsed[i] &gt; tmp)
						min_elapsed[i] = tmp;
				}
			}
			for (i = 0; i &lt; TEST_ENTRIES; i++)
				elapsed[i] += min_elapsed[i];
		}

		for (i = 0; i &lt; TEST_ENTRIES; i++)
			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

		k = 0;
		srand(seed);
		for (j = 0; j &lt; loops; j++) {
			unsigned long a = get_rand();
			unsigned long b = argc &gt; optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			for (i = 1; i &lt; TEST_ENTRIES; i++) {
				if (res[j][i] != res[j][0])
					break;
			}
			if (i &lt; TEST_ENTRIES) {
				if (k == 0) {
					k = 1;
					fprintf(stderr, "Error:\n");
				}
				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
				for (i = 0; i &lt; TEST_ENTRIES; i++)
					fprintf(stderr, "%ld%s", res[j][i], i &lt; TEST_ENTRIES - 1 ? ", " : "\n");
			}
		}

		if (k == 0)
			fprintf(stderr, "PASS\n");

		free(res);

		return 0;
	}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 10174
  gcd1: elapsed 2120
  gcd2: elapsed 2902
  gcd3: elapsed 2039
  gcd4: elapsed 2812
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9309
  gcd1: elapsed 2280
  gcd2: elapsed 2822
  gcd3: elapsed 2217
  gcd4: elapsed 2710
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9589
  gcd1: elapsed 2098
  gcd2: elapsed 2815
  gcd3: elapsed 2030
  gcd4: elapsed 2718
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9914
  gcd1: elapsed 2309
  gcd2: elapsed 2779
  gcd3: elapsed 2228
  gcd4: elapsed 2709
  PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng &lt;zhaoxiu.zeng@gmail.com&gt;
Signed-off-by: George Spelvin &lt;linux@horizon.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The binary GCD algorithm is based on the following facts:
	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available.  This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;stdint.h&gt;
	#include &lt;string.h&gt;
	#include &lt;time.h&gt;
	#include &lt;unistd.h&gt;

	#define swap(a, b) \
		do { \
			a ^= b; \
			b ^= a; \
			a ^= b; \
		} while (0)

	unsigned long gcd0(unsigned long a, unsigned long b)
	{
		unsigned long r;

		if (a &lt; b) {
			swap(a, b);
		}

		if (b == 0)
			return a;

		while ((r = a % b) != 0) {
			a = b;
			b = r;
		}

		return b;
	}

	unsigned long gcd1(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b &gt;&gt;= __builtin_ctzl(b);

		for (;;) {
			a &gt;&gt;= __builtin_ctzl(a);
			if (a == b)
				return a &lt;&lt; __builtin_ctzl(r);

			if (a &lt; b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd2(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &amp;= -r;

		while (!(b &amp; r))
			b &gt;&gt;= 1;

		for (;;) {
			while (!(a &amp; r))
				a &gt;&gt;= 1;
			if (a == b)
				return a;

			if (a &lt; b)
				swap(a, b);
			a -= b;
			a &gt;&gt;= 1;
			if (a &amp; r)
				a += b;
			a &gt;&gt;= 1;
		}
	}

	unsigned long gcd3(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b &gt;&gt;= __builtin_ctzl(b);
		if (b == 1)
			return r &amp; -r;

		for (;;) {
			a &gt;&gt;= __builtin_ctzl(a);
			if (a == 1)
				return r &amp; -r;
			if (a == b)
				return a &lt;&lt; __builtin_ctzl(r);

			if (a &lt; b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd4(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &amp;= -r;

		while (!(b &amp; r))
			b &gt;&gt;= 1;
		if (b == r)
			return r;

		for (;;) {
			while (!(a &amp; r))
				a &gt;&gt;= 1;
			if (a == r)
				return r;
			if (a == b)
				return a;

			if (a &lt; b)
				swap(a, b);
			a -= b;
			a &gt;&gt;= 1;
			if (a &amp; r)
				a += b;
			a &gt;&gt;= 1;
		}
	}

	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
		gcd0, gcd1, gcd2, gcd3, gcd4,
	};

	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

	#if defined(__x86_64__)

	#define rdtscll(val) do { \
		unsigned long __a,__d; \
		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)&lt;&lt;32); \
	} while(0)

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		unsigned long long start, end;
		unsigned long long ret;
		unsigned long gcd_res;

		rdtscll(start);
		gcd_res = gcd(a, b);
		rdtscll(end);

		if (end &gt;= start)
			ret = end - start;
		else
			ret = ~0ULL - start + 1 + end;

		*res = gcd_res;
		return ret;
	}

	#else

	static inline struct timespec read_time(void)
	{
		struct timespec time;
		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &amp;time);
		return time;
	}

	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
	{
		struct timespec temp;

		if ((end.tv_nsec - start.tv_nsec) &lt; 0) {
			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
		} else {
			temp.tv_sec = end.tv_sec - start.tv_sec;
			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
		}

		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
	}

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		struct timespec start, end;
		unsigned long gcd_res;

		start = read_time();
		gcd_res = gcd(a, b);
		end = read_time();

		*res = gcd_res;
		return diff_time(start, end);
	}

	#endif

	static inline unsigned long get_rand()
	{
		if (sizeof(long) == 8)
			return (unsigned long)rand() &lt;&lt; 32 | rand();
		else
			return rand();
	}

	int main(int argc, char **argv)
	{
		unsigned int seed = time(0);
		int loops = 100;
		int repeats = 1000;
		unsigned long (*res)[TEST_ENTRIES];
		unsigned long long elapsed[TEST_ENTRIES];
		int i, j, k;

		for (;;) {
			int opt = getopt(argc, argv, "n:r:s:");
			/* End condition always first */
			if (opt == -1)
				break;

			switch (opt) {
			case 'n':
				loops = atoi(optarg);
				break;
			case 'r':
				repeats = atoi(optarg);
				break;
			case 's':
				seed = strtoul(optarg, NULL, 10);
				break;
			default:
				/* You won't actually get here. */
				break;
			}
		}

		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
		memset(elapsed, 0, sizeof(elapsed));

		srand(seed);
		for (j = 0; j &lt; loops; j++) {
			unsigned long a = get_rand();
			/* Do we have args? */
			unsigned long b = argc &gt; optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			unsigned long long min_elapsed[TEST_ENTRIES];
			for (k = 0; k &lt; repeats; k++) {
				for (i = 0; i &lt; TEST_ENTRIES; i++) {
					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &amp;res[j][i]);
					if (k == 0 || min_elapsed[i] &gt; tmp)
						min_elapsed[i] = tmp;
				}
			}
			for (i = 0; i &lt; TEST_ENTRIES; i++)
				elapsed[i] += min_elapsed[i];
		}

		for (i = 0; i &lt; TEST_ENTRIES; i++)
			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

		k = 0;
		srand(seed);
		for (j = 0; j &lt; loops; j++) {
			unsigned long a = get_rand();
			unsigned long b = argc &gt; optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			for (i = 1; i &lt; TEST_ENTRIES; i++) {
				if (res[j][i] != res[j][0])
					break;
			}
			if (i &lt; TEST_ENTRIES) {
				if (k == 0) {
					k = 1;
					fprintf(stderr, "Error:\n");
				}
				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
				for (i = 0; i &lt; TEST_ENTRIES; i++)
					fprintf(stderr, "%ld%s", res[j][i], i &lt; TEST_ENTRIES - 1 ? ", " : "\n");
			}
		}

		if (k == 0)
			fprintf(stderr, "PASS\n");

		free(res);

		return 0;
	}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 10174
  gcd1: elapsed 2120
  gcd2: elapsed 2902
  gcd3: elapsed 2039
  gcd4: elapsed 2812
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9309
  gcd1: elapsed 2280
  gcd2: elapsed 2822
  gcd3: elapsed 2217
  gcd4: elapsed 2710
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9589
  gcd1: elapsed 2098
  gcd2: elapsed 2815
  gcd3: elapsed 2030
  gcd4: elapsed 2718
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9914
  gcd1: elapsed 2309
  gcd2: elapsed 2779
  gcd3: elapsed 2228
  gcd4: elapsed 2709
  PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng &lt;zhaoxiu.zeng@gmail.com&gt;
Signed-off-by: George Spelvin &lt;linux@horizon.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>microblaze/PCI: Support generic Xilinx AXI PCIe Host Bridge IP driver</title>
<updated>2016-03-08T20:25:57+00:00</updated>
<author>
<name>Bharat Kumar Gogada</name>
<email>bharat.kumar.gogada@xilinx.com</email>
</author>
<published>2016-02-11T16:28:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=01cf9d524ff06703dbed37014f34dba9c1b2434d'/>
<id>01cf9d524ff06703dbed37014f34dba9c1b2434d</id>
<content type='text'>
Modify the Microblaze PCI subsystem to work with the generic
drivers/pci/host/pcie-xilinx.c driver on Microblaze and Zynq.

[bhelgaas: changelog]
Signed-off-by: Bharat Kumar Gogada &lt;bharatku@xilinx.com&gt;
Signed-off-by: Ravi Kiran Gummaluri &lt;rgummal@xilinx.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Acked-by: Michal Simek &lt;michal.simek@xilinx.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Modify the Microblaze PCI subsystem to work with the generic
drivers/pci/host/pcie-xilinx.c driver on Microblaze and Zynq.

[bhelgaas: changelog]
Signed-off-by: Bharat Kumar Gogada &lt;bharatku@xilinx.com&gt;
Signed-off-by: Ravi Kiran Gummaluri &lt;rgummal@xilinx.com&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Acked-by: Michal Simek &lt;michal.simek@xilinx.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: always provide the dma_map_ops based implementation</title>
<updated>2016-01-21T01:09:18+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2016-01-20T23:02:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e1c7e324539ada3b2b13ca2898bcb4948a9ef9db'/>
<id>e1c7e324539ada3b2b13ca2898bcb4948a9ef9db</id>
<content type='text'>
Move the generic implementation to &lt;linux/dma-mapping.h&gt; now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Aurelien Jacquiot &lt;a-jacquiot@ti.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@ezchip.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Haavard Skinnemoen &lt;hskinnemoen@gmail.com&gt;
Cc: Hans-Christian Egtvedt &lt;egtvedt@samfundet.no&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Cc: Koichi Yasutake &lt;yasutake.koichi@jp.panasonic.com&gt;
Cc: Ley Foon Tan &lt;lftan@altera.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Mikael Starvik &lt;starvik@axis.com&gt;
Cc: Steven Miao &lt;realmz6@gmail.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Sebastian Ott &lt;sebott@linux.vnet.ibm.com&gt;
Signed-off-by: Valentin Rothberg &lt;valentinrothberg@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the generic implementation to &lt;linux/dma-mapping.h&gt; now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Aurelien Jacquiot &lt;a-jacquiot@ti.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@ezchip.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Haavard Skinnemoen &lt;hskinnemoen@gmail.com&gt;
Cc: Hans-Christian Egtvedt &lt;egtvedt@samfundet.no&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Cc: Koichi Yasutake &lt;yasutake.koichi@jp.panasonic.com&gt;
Cc: Ley Foon Tan &lt;lftan@altera.com&gt;
Cc: Mark Salter &lt;msalter@redhat.com&gt;
Cc: Mikael Starvik &lt;starvik@axis.com&gt;
Cc: Steven Miao &lt;realmz6@gmail.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Sebastian Ott &lt;sebott@linux.vnet.ibm.com&gt;
Signed-off-by: Valentin Rothberg &lt;valentinrothberg@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Kconfig: remove HAVE_LATENCYTOP_SUPPORT</title>
<updated>2016-01-16T19:17:23+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2016-01-16T00:58:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=da48d094ce5d7c7dcdad9011648a81c42fd1c2ef'/>
<id>da48d094ce5d7c7dcdad9011648a81c42fd1c2ef</id>
<content type='text'>
As illustrated by commit a3afe70b83fd ("[S390] latencytop s390
support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
advertise an implementation of save_stack_trace_tsk.

However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk()
weak alias") a dummy implementation is provided if STACKTRACE=y.  Given
that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Guan Xuetao &lt;gxt@mprc.pku.edu.cn&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As illustrated by commit a3afe70b83fd ("[S390] latencytop s390
support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
advertise an implementation of save_stack_trace_tsk.

However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk()
weak alias") a dummy implementation is provided if STACKTRACE=y.  Given
that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Guan Xuetao &lt;gxt@mprc.pku.edu.cn&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>gcov: enable GCOV_PROFILE_ALL from ARCH Kconfigs</title>
<updated>2014-12-13T20:42:51+00:00</updated>
<author>
<name>Riku Voipio</name>
<email>riku.voipio@linaro.org</email>
</author>
<published>2014-12-13T00:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=957e3facd147510f2cf8780e38606f1d707f0e33'/>
<id>957e3facd147510f2cf8780e38606f1d707f0e33</id>
<content type='text'>
Following the suggestions from Andrew Morton and Stephen Rothwell,
Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
can enable.

set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
previously allowed + ARM64 which I tested.

Signed-off-by: Riku Voipio &lt;riku.voipio@linaro.org&gt;
Cc: Peter Oberparleiter &lt;oberpar@linux.vnet.ibm.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Following the suggestions from Andrew Morton and Stephen Rothwell,
Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
can enable.

set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
previously allowed + ARM64 which I tested.

Signed-off-by: Riku Voipio &lt;riku.voipio@linaro.org&gt;
Cc: Peter Oberparleiter &lt;oberpar@linux.vnet.ibm.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>microblaze: Fix missing NR_CPUS in menuconfig</title>
<updated>2014-10-27T07:29:54+00:00</updated>
<author>
<name>Michal Simek</name>
<email>michal.simek@xilinx.com</email>
</author>
<published>2014-10-27T07:28:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4cbbbb43d666468d87f99676cf05fb5154d1228b'/>
<id>4cbbbb43d666468d87f99676cf05fb5154d1228b</id>
<content type='text'>
The time Kconfig expects that NR_CPUS is defined.

This patch remove this config warning:
"kernel/time/Kconfig:163:warning: range is invalid"

Signed-off-by: Michal Simek &lt;michal.simek@xilinx.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The time Kconfig expects that NR_CPUS is defined.

This patch remove this config warning:
"kernel/time/Kconfig:163:warning: range is invalid"

Signed-off-by: Michal Simek &lt;michal.simek@xilinx.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
