From 7817b799ed6b270fbf7f2b30efd0ae011dfc9644 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Tue, 29 Dec 2015 16:23:18 -0800
Subject: documentation: Fix control dependency and identical stores

The summary of the "CONTROL DEPENDENCIES" section incorrectly states that
barrier() may be used to prevent compiler reordering when more than one
leg of the control-dependent "if" statement start with identical stores.
This is incorrect at high optimization levels.  This commit therefore
updates the summary to match the detailed description.

Reported by: Jianyu Zhan <nasa4836@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 904ee42d078e..e26058d3e253 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -800,9 +800,13 @@ In summary:
       use smp_rmb(), smp_wmb(), or, in the case of prior stores and
       later loads, smp_mb().
 
-  (*) If both legs of the "if" statement begin with identical stores
-      to the same variable, a barrier() statement is required at the
-      beginning of each leg of the "if" statement.
+  (*) If both legs of the "if" statement begin with identical stores to
+      the same variable, then those stores must be ordered, either by
+      preceding both of them with smp_mb() or by using smp_store_release()
+      to carry out the stores.  Please note that it is -not- sufficient
+      to use barrier() at beginning of each leg of the "if" statement,
+      as optimizing compilers do not necessarily respect barrier()
+      in this case.
 
   (*) Control dependencies require at least one run-time conditional
       between the prior load and the subsequent store, and this
-- 
cgit v1.2.3


From 895f5542220eeea43b811a9b4cd73f244c5673d7 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Wed, 6 Jan 2016 14:23:03 -0800
Subject: documentation: Fix memory-barriers.txt section references

This commit fixes a couple of "Compiler Barrier" section references to
be "COMPILER BARRIER".  This makes it easier to find the section in
the usual text editors.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index e26058d3e253..c90922b9b294 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -232,7 +232,7 @@ And there are a number of things that _must_ or _must_not_ be assumed:
      with memory references that are not protected by READ_ONCE() and
      WRITE_ONCE().  Without them, the compiler is within its rights to
      do all sorts of "creative" transformations, which are covered in
-     the Compiler Barrier section.
+     the COMPILER BARRIER section.
 
  (*) It _must_not_ be assumed that independent loads and stores will be issued
      in the order given.  This means that for:
@@ -818,7 +818,7 @@ In summary:
   (*) Control dependencies require that the compiler avoid reordering the
       dependency into nonexistence.  Careful use of READ_ONCE() or
       atomic{,64}_read() can help to preserve your control dependency.
-      Please see the Compiler Barrier section for more information.
+      Please see the COMPILER BARRIER section for more information.
 
   (*) Control dependencies pair normally with other types of barriers.
 
-- 
cgit v1.2.3


From 0e4bd2aba3d0ae5caeb0d1a2b71f6fe6147c4d56 Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 14 Jan 2016 14:01:34 -0800
Subject: documentation: Remove obsolete reference to RCU-protected indexes

Commit #1ebee8017d84 (rcu: Eliminate array-index-based RCU primitives)
eliminated the primitives supporting RCU-protected array indexes, but
failed to update Documentation/memory-barriers.txt accordingly.  This
commit therefore removes the discussion of RCU-protected array indexes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 15 ---------------
 1 file changed, 15 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index c90922b9b294..6bee0a2c43ab 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -565,21 +565,6 @@ odd-numbered bank is idle, one can see the new value of the pointer P (&B),
 but the old value of the variable B (2).
 
 
-Another example of where data dependency barriers might be required is where a
-number is read from memory and then used to calculate the index for an array
-access:
-
-	CPU 1		      CPU 2
-	===============	      ===============
-	{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
-	M[1] = 4;
-	<write barrier>
-	WRITE_ONCE(P, 1);
-			      Q = READ_ONCE(P);
-			      <data dependency barrier>
-			      D = M[Q];
-
-
 The data dependency barrier is very important to the RCU system,
 for example.  See rcu_assign_pointer() and rcu_dereference() in
 include/linux/rcupdate.h.  This permits the current target of an RCU'd
-- 
cgit v1.2.3


From 92a84dd210b8263f765882d3ee1a1d5cd348c16a Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Thu, 14 Jan 2016 14:17:04 -0800
Subject: documentation: Subsequent writes ordered by rcu_dereference()

The current memory-barriers.txt does not address the possibility of
a write to a dereferenced pointer.  This should be rare, but when it
happens, we need that write -not- to be clobbered by the initialization.
This commit therefore adds an example showing a data dependency ordering
a later data-dependent write.

Reported-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 6bee0a2c43ab..e9ebeb3b1077 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -555,6 +555,30 @@ between the address load and the data load:
 This enforces the occurrence of one of the two implications, and prevents the
 third possibility from arising.
 
+A data-dependency barrier must also order against dependent writes:
+
+	CPU 1		      CPU 2
+	===============	      ===============
+	{ A == 1, B == 2, C = 3, P == &A, Q == &C }
+	B = 4;
+	<write barrier>
+	WRITE_ONCE(P, &B);
+			      Q = READ_ONCE(P);
+			      <data dependency barrier>
+			      *Q = 5;
+
+The data-dependency barrier must order the read into Q with the store
+into *Q.  This prohibits this outcome:
+
+	(Q == B) && (B == 4)
+
+Please note that this pattern should be rare.  After all, the whole point
+of dependency ordering is to -prevent- writes to the data structure, along
+with the expensive cache misses associated with those writes.  This pattern
+can be used to record rare error conditions and the like, and the ordering
+prevents such records from being lost.
+
+
 [!] Note that this extremely counterintuitive situation arises most easily on
 machines with split caches, so that, for example, one cache bank processes
 even-numbered cache lines and the other bank processes odd-numbered cache
-- 
cgit v1.2.3


From c535cc92924baf68e238bd1b5ff8d74883f88b9b Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Fri, 15 Jan 2016 09:30:42 -0800
Subject: documentation: Distinguish between local and global transitivity

The introduction of smp_load_acquire() and smp_store_release() had
the side effect of introducing a weaker notion of transitivity:
The transitivity of full smp_mb() barriers is global, but that
of smp_store_release()/smp_load_acquire() chains is local.  This
commit therefore introduces the notion of local transitivity and
gives an example.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 78 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 76 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index e9ebeb3b1077..ae9d306725ba 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
 General barriers are therefore required to ensure that all CPUs agree
 on the combined order of CPU 1's and CPU 2's accesses.
 
-To reiterate, if your code requires transitivity, use general barriers
-throughout.
+General barriers provide "global transitivity", so that all CPUs will
+agree on the order of operations.  In contrast, a chain of release-acquire
+pairs provides only "local transitivity", so that only those CPUs on
+the chain are guaranteed to agree on the combined order of the accesses.
+For example, switching to C code in deference to Herman Hollerith:
+
+	int u, v, x, y, z;
+
+	void cpu0(void)
+	{
+		r0 = smp_load_acquire(&x);
+		WRITE_ONCE(u, 1);
+		smp_store_release(&y, 1);
+	}
+
+	void cpu1(void)
+	{
+		r1 = smp_load_acquire(&y);
+		r4 = READ_ONCE(v);
+		r5 = READ_ONCE(u);
+		smp_store_release(&z, 1);
+	}
+
+	void cpu2(void)
+	{
+		r2 = smp_load_acquire(&z);
+		smp_store_release(&x, 1);
+	}
+
+	void cpu3(void)
+	{
+		WRITE_ONCE(v, 1);
+		smp_mb();
+		r3 = READ_ONCE(u);
+	}
+
+Because cpu0(), cpu1(), and cpu2() participate in a local transitive
+chain of smp_store_release()/smp_load_acquire() pairs, the following
+outcome is prohibited:
+
+	r0 == 1 && r1 == 1 && r2 == 1
+
+Furthermore, because of the release-acquire relationship between cpu0()
+and cpu1(), cpu1() must see cpu0()'s writes, so that the following
+outcome is prohibited:
+
+	r1 == 1 && r5 == 0
+
+However, the transitivity of release-acquire is local to the participating
+CPUs and does not apply to cpu3().  Therefore, the following outcome
+is possible:
+
+	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
+
+Although cpu0(), cpu1(), and cpu2() will see their respective reads and
+writes in order, CPUs not involved in the release-acquire chain might
+well disagree on the order.  This disagreement stems from the fact that
+the weak memory-barrier instructions used to implement smp_load_acquire()
+and smp_store_release() are not required to order prior stores against
+subsequent loads in all cases.  This means that cpu3() can see cpu0()'s
+store to u as happening -after- cpu1()'s load from v, even though
+both cpu0() and cpu1() agree that these two operations occurred in the
+intended order.
+
+However, please keep in mind that smp_load_acquire() is not magic.
+In particular, it simply reads from its argument with ordering.  It does
+-not- ensure that any particular value will be read.  Therefore, the
+following outcome is possible:
+
+	r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
+
+Note that this outcome can happen even on a mythical sequentially
+consistent system where nothing is ever reordered.
+
+To reiterate, if your code requires global transitivity, use general
+barriers throughout.
 
 
 ========================
-- 
cgit v1.2.3


From 37ef0341ca60b364dde05239c98b15c999195d8c Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Mon, 25 Jan 2016 22:12:34 -0800
Subject: documentation:  Add alternative release-acquire outcome

The memory-barriers.txt discussion of local transitivity and
release-acquire chains leaves out discussion of the outcome of
the read from "u".  This commit therefore adds an outcome showing
that you can get a "1" from this read even if the release-acquire
pairs don't line up.

Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 4 ++++
 1 file changed, 4 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index ae9d306725ba..57e4a4b053c5 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1372,6 +1372,10 @@ is possible:
 
 	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
 
+As an aside, the following outcome is also possible:
+
+	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1
+
 Although cpu0(), cpu1(), and cpu2() will see their respective reads and
 writes in order, CPUs not involved in the release-acquire chain might
 well disagree on the order.  This disagreement stems from the fact that
-- 
cgit v1.2.3


From f36fe1e70b5477d4e42df8ea97278e9698dddbbf Mon Sep 17 00:00:00 2001
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Mon, 15 Feb 2016 14:50:36 -0800
Subject: documentation: Transitivity is not cumulativity

The "transitivity" section mentions cumulativity in a potentially
confusing way.  Contrary to the current wording, cumulativity is
not transitivity, but rather a hardware discipline that can be used
to implement transitivity on ARM and PowerPC CPUs.  This commit
therefore deletes the mention of cumulativity.

Reported-by: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 57e4a4b053c5..8367d393cba2 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1270,7 +1270,7 @@ TRANSITIVITY
 
 Transitivity is a deeply intuitive notion about ordering that is not
 always provided by real computer systems.  The following example
-demonstrates transitivity (also called "cumulativity"):
+demonstrates transitivity:
 
 	CPU 1			CPU 2			CPU 3
 	=======================	=======================	=======================
-- 
cgit v1.2.3


From 65f95ff2e41a32dd190cf28e3abb029625eef968 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj38.park@gmail.com>
Date: Mon, 22 Feb 2016 08:28:29 -0800
Subject: documentation: Clarify compiler store-fusion example

The compiler store-fusion example in memory-barriers.txt uses a C
comment to represent arbitrary code that does not update a given
variable.  Unfortunately, someone could reasonably interpret the
comment as instead referring to the following line of code.  This
commit therefore replaces the comment with a string that more
clearly represents the arbitrary code.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/memory-barriers.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 8367d393cba2..3729cbe60e41 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1550,7 +1550,7 @@ of optimizations:
      the following:
 
 	a = 0;
-	/* Code that does not store to variable a. */
+	... Code that does not store to variable a ...
 	a = 0;
 
      The compiler sees that the value of variable 'a' is already zero, so
@@ -1562,7 +1562,7 @@ of optimizations:
      wrong guess:
 
 	WRITE_ONCE(a, 0);
-	/* Code that does not store to variable a. */
+	... Code that does not store to variable a ...
 	WRITE_ONCE(a, 0);
 
  (*) The compiler is within its rights to reorder memory accesses unless
-- 
cgit v1.2.3