From 5cc415001bca8fe0e3f0ee6d58a953a314dd9751 Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Mon, 6 Apr 2020 12:41:52 +0200 Subject: Drivers: hv: avoid passing opaque pointer to vmbus_onmessage() vmbus_onmessage() doesn't need the header of the message, it only uses it to get to the payload, we can pass the pointer to the payload directly. Signed-off-by: Vitaly Kuznetsov Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20200406104154.45010-4-vkuznets@redhat.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 692c89ccf5df..cbd24f4e68d1 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1017,7 +1017,7 @@ static inline void clear_low_latency_mode(struct vmbus_channel *c) c->low_latency = false; } -void vmbus_onmessage(void *context); +void vmbus_onmessage(struct vmbus_channel_message_header *hdr); int vmbus_request_offers(void); -- cgit v1.2.3 From 8b6a877c060ed6b86878fe66c7c6493a6054cf23 Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Mon, 6 Apr 2020 02:15:06 +0200 Subject: Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels When Hyper-V sends an interrupt to the guest, the guest has to figure out which channel the interrupt is associated with. Hyper-V sets a bit in a memory page that is shared with the guest, indicating a particular "relid" that the interrupt is associated with. The current Linux code then uses a set of per-CPU linked lists to map a given "relid" to a pointer to a channel structure. This design introduces a synchronization problem if the CPU that Hyper-V will interrupt for a certain channel is changed. If the interrupt comes on the "old CPU" and the channel was already moved to the per-CPU list of the "new CPU", then the relid -> channel mapping will fail and the interrupt is dropped. Similarly, if the interrupt comes on the new CPU but the channel was not moved to the per-CPU list of the new CPU, then the mapping will fail and the interrupt is dropped. Relids are integers ranging from 0 to 2047. The mapping from relids to channel structures can be done by setting up an array with 2048 entries, each entry being a pointer to a channel structure (hence total size ~16K bytes, which is not a problem). The array is global, so there are no per-CPU linked lists to update. The array can be searched and updated by loading from/storing to the array at the specified index. With no per-CPU data structures, the above mentioned synchronization problem is avoided and the relid2channel() function gets simpler. Suggested-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) Link: https://lore.kernel.org/r/20200406001514.19876-4-parri.andrea@gmail.com Reviewed-by: Michael Kelley Signed-off-by: Wei Liu --- include/linux/hyperv.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index cbd24f4e68d1..f5b3f008c55a 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -854,11 +854,6 @@ struct vmbus_channel { * Support per-channel state for use by vmbus drivers. */ void *per_channel_state; - /* - * To support per-cpu lookup mapping of relid to channel, - * link up channels based on their CPU affinity. - */ - struct list_head percpu_list; /* * Defer freeing channel until after all cpu's have -- cgit v1.2.3 From 9403b66e6161130ebae7e55a97491c84c1ad6f9f Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Mon, 6 Apr 2020 02:15:09 +0200 Subject: Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Since vmbus_chan_sched() dereferences the ring buffer pointer, we have to make sure that the ring buffer data structures don't get freed while such dereferencing is happening. Current code does this by sending an IPI to the CPU that is allowed to access that ring buffer from interrupt level, cf., vmbus_reset_channel_cb(). But with the new functionality to allow changing the CPU that a channel will interrupt, we can't be sure what CPU will be running the vmbus_chan_sched() function for a particular channel, so the current IPI mechanism is infeasible. Instead synchronize vmbus_chan_sched() and vmbus_reset_channel_cb() by using the (newly introduced) per-channel spin lock "sched_lock". Move the test for onchannel_callback being NULL before the "switch" control statement in vmbus_chan_sched(), in order to not access the ring buffer if the vmbus_reset_channel_cb() has been completed on the channel. Suggested-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) Link: https://lore.kernel.org/r/20200406001514.19876-7-parri.andrea@gmail.com Reviewed-by: Michael Kelley Signed-off-by: Wei Liu --- include/linux/hyperv.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index f5b3f008c55a..62b2c11d0875 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -771,6 +771,12 @@ struct vmbus_channel { void (*onchannel_callback)(void *context); void *channel_callback_context; + /* + * Synchronize channel scheduling and channel removal; see the inline + * comments in vmbus_chan_sched() and vmbus_reset_channel_cb(). + */ + spinlock_t sched_lock; + /* * A channel can be marked for one of three modes of reading: * BATCHED - callback called from taslket and should read -- cgit v1.2.3 From 8ef4c4abbbcdcd9d4bc0fd9454df03e6dac24b73 Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Mon, 6 Apr 2020 02:15:11 +0200 Subject: Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic The logic is unused since commit 509879bdb30b8 ("Drivers: hv: Introduce a policy for controlling channel affinity"). This logic assumes that a channel target_cpu doesn't change during the lifetime of a channel, but this assumption is incompatible with the new functionality that allows changing the vCPU a channel will interrupt. Signed-off-by: Andrea Parri (Microsoft) Link: https://lore.kernel.org/r/20200406001514.19876-9-parri.andrea@gmail.com Reviewed-by: Michael Kelley Signed-off-by: Wei Liu --- include/linux/hyperv.h | 27 --------------------------- 1 file changed, 27 deletions(-) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 62b2c11d0875..247356dbd742 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -689,11 +689,6 @@ union hv_connection_id { } u; }; -enum hv_numa_policy { - HV_BALANCED = 0, - HV_LOCALIZED, -}; - enum vmbus_device_type { HV_IDE = 0, HV_SCSI, @@ -808,10 +803,6 @@ struct vmbus_channel { u32 target_vp; /* The corresponding CPUID in the guest */ u32 target_cpu; - /* - * State to manage the CPU affiliation of channels. - */ - struct cpumask alloced_cpus_in_node; int numa_node; /* * Support for sub-channels. For high performance devices, @@ -898,18 +889,6 @@ struct vmbus_channel { */ bool low_latency; - /* - * NUMA distribution policy: - * We support two policies: - * 1) Balanced: Here all performance critical channels are - * distributed evenly amongst all the NUMA nodes. - * This policy will be the default policy. - * 2) Localized: All channels of a given instance of a - * performance critical service will be assigned CPUs - * within a selected NUMA node. - */ - enum hv_numa_policy affinity_policy; - bool probe_done; /* @@ -965,12 +944,6 @@ static inline bool is_sub_channel(const struct vmbus_channel *c) return c->offermsg.offer.sub_channel_index != 0; } -static inline void set_channel_affinity_state(struct vmbus_channel *c, - enum hv_numa_policy policy) -{ - c->affinity_policy = policy; -} - static inline void set_channel_read_mode(struct vmbus_channel *c, enum hv_callback_mode mode) { -- cgit v1.2.3 From 7527810573436f00e582d3d5ef2eb3c027c98d7d Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Mon, 6 Apr 2020 02:15:13 +0200 Subject: Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type VMBus version 4.1 and later support the CHANNELMSG_MODIFYCHANNEL(22) message type which can be used to request Hyper-V to change the vCPU that a channel will interrupt. Introduce the CHANNELMSG_MODIFYCHANNEL message type, and define the vmbus_send_modifychannel() function to send CHANNELMSG_MODIFYCHANNEL requests to the host via a hypercall. The function is then used to define a sysfs "store" operation, which allows to change the (v)CPU the channel will interrupt by using the sysfs interface. The feature can be used for load balancing or other purposes. One interesting catch here is that Hyper-V can *not* currently ACK CHANNELMSG_MODIFYCHANNEL messages with the promise that (after the ACK is sent) the channel won't send any more interrupts to the "old" CPU. The peculiarity of the CHANNELMSG_MODIFYCHANNEL messages is problematic if the user want to take a CPU offline, since we don't want to take a CPU offline (and, potentially, "lose" channel interrupts on such CPU) if the host is still processing a CHANNELMSG_MODIFYCHANNEL message associated to that CPU. It is worth mentioning, however, that we have been unable to observe the above mentioned "race": in all our tests, CHANNELMSG_MODIFYCHANNEL requests appeared *as if* they were processed synchronously by the host. Suggested-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) Link: https://lore.kernel.org/r/20200406001514.19876-11-parri.andrea@gmail.com Reviewed-by: Michael Kelley [ wei: fix conflict in channel_mgmt.c ] Signed-off-by: Wei Liu --- include/linux/hyperv.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 247356dbd742..b85d7580f2c1 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -425,7 +425,7 @@ enum vmbus_channel_message_type { CHANNELMSG_19 = 19, CHANNELMSG_20 = 20, CHANNELMSG_TL_CONNECT_REQUEST = 21, - CHANNELMSG_22 = 22, + CHANNELMSG_MODIFYCHANNEL = 22, CHANNELMSG_TL_CONNECT_RESULT = 23, CHANNELMSG_COUNT }; @@ -620,6 +620,13 @@ struct vmbus_channel_tl_connect_request { guid_t host_service_id; } __packed; +/* Modify Channel parameters, cf. vmbus_send_modifychannel() */ +struct vmbus_channel_modifychannel { + struct vmbus_channel_message_header header; + u32 child_relid; + u32 target_vp; +} __packed; + struct vmbus_channel_version_response { struct vmbus_channel_message_header header; u8 version_supported; @@ -1505,6 +1512,7 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const guid_t *shv_guest_servie_id, const guid_t *shv_host_servie_id); +int vmbus_send_modifychannel(u32 child_relid, u32 target_vp); void vmbus_set_event(struct vmbus_channel *channel); /* Get the start of the ring buffer. */ -- cgit v1.2.3 From 7769e18c201aa88eade5556faf9da7f2bc15bb8a Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Mon, 6 Apr 2020 02:15:14 +0200 Subject: scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned For each storvsc_device, storvsc keeps track of the channel target CPUs associated to the device (alloced_cpus) and it uses this information to fill a "cache" (stor_chns) mapping CPU->channel according to a certain heuristic. Update the alloced_cpus mask and the stor_chns array when a channel of the storvsc device is re-assigned to a different CPU. Signed-off-by: Andrea Parri (Microsoft) Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: Link: https://lore.kernel.org/r/20200406001514.19876-12-parri.andrea@gmail.com Reviewed-by; Long Li Reviewed-by: Michael Kelley [ wei: fix a small issue reported by kbuild test robot ] Signed-off-by: Wei Liu --- include/linux/hyperv.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index b85d7580f2c1..cd64ab7bb3b7 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -773,6 +773,9 @@ struct vmbus_channel { void (*onchannel_callback)(void *context); void *channel_callback_context; + void (*change_target_cpu_callback)(struct vmbus_channel *channel, + u32 old, u32 new); + /* * Synchronize channel scheduling and channel removal; see the inline * comments in vmbus_chan_sched() and vmbus_reset_channel_cb(). -- cgit v1.2.3 From db5871e85533f06ffe1795c1cb9b9153860d1e4f Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 7 May 2020 13:53:23 -0500 Subject: vmbus: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva Link: https://lore.kernel.org/r/20200507185323.GA14416@embeddedor Signed-off-by: Wei Liu --- include/linux/hyperv.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index cd64ab7bb3b7..d783847d8cb4 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -117,7 +117,7 @@ struct hv_ring_buffer { * Ring data starts here + RingDataStartOffset * !!! DO NOT place any fields below this !!! */ - u8 buffer[0]; + u8 buffer[]; } __packed; struct hv_ring_buffer_info { @@ -313,7 +313,7 @@ struct vmadd_remove_transfer_page_set { struct gpa_range { u32 byte_count; u32 byte_offset; - u64 pfn_array[0]; + u64 pfn_array[]; }; /* @@ -563,7 +563,7 @@ struct vmbus_channel_gpadl_header { u32 gpadl; u16 range_buflen; u16 rangecount; - struct gpa_range range[0]; + struct gpa_range range[]; } __packed; /* This is the followup packet that contains more PFNs. */ @@ -571,7 +571,7 @@ struct vmbus_channel_gpadl_body { struct vmbus_channel_message_header header; u32 msgnumber; u32 gpadl; - u64 pfn[0]; + u64 pfn[]; } __packed; struct vmbus_channel_gpadl_created { @@ -679,7 +679,7 @@ struct vmbus_channel_msginfo { * The channel message that goes out on the "wire". * It will contain at minimum the VMBUS_CHANNEL_MESSAGE_HEADER header */ - unsigned char msg[0]; + unsigned char msg[]; }; struct vmbus_close_msg { -- cgit v1.2.3 From afaa33da08abd10be8978781d7c99a9e67d2bbff Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Fri, 22 May 2020 19:19:01 +0200 Subject: Drivers: hv: vmbus: Resolve more races involving init_vp_index() init_vp_index() uses the (per-node) hv_numa_map[] masks to record the CPUs allocated for channel interrupts at a given time, and distribute the performance-critical channels across the available CPUs: in part., the mask of "candidate" target CPUs in a given NUMA node, for a newly offered channel, is determined by XOR-ing the node's CPU mask and the node's hv_numa_map. This operation/mechanism assumes that no offline CPUs is set in the hv_numa_map mask, an assumption that does not hold since such mask is currently not updated when a channel is removed or assigned to a different CPU. To address the issues described above, this adds hooks in the channel removal path (hv_process_channel_removal()) and in target_cpu_store() in order to clear, resp. to update, the hv_numa_map[] masks as needed. This also adds a (missed) update of the masks in init_vp_index() (cf., e.g., the memory-allocation failure path in this function). Like in the case of init_vp_index(), such hooks require to determine if the given channel is performance critical. init_vp_index() does this by parsing the channel's offer, it can not rely on the device data structure (device_obj) to retrieve such information because the device data structure has not been allocated/linked with the channel by the time that init_vp_index() executes. A similar situation may hold in hv_is_alloced_cpu() (defined below); the adopted approach is to "cache" the device type of the channel, as computed by parsing the channel's offer, in the channel structure itself. Fixes: 7527810573436f ("Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type") Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20200522171901.204127-3-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux/hyperv.h') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index d783847d8cb4..40df3103e890 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -901,6 +901,13 @@ struct vmbus_channel { bool probe_done; + /* + * Cache the device ID here for easy access; this is useful, in + * particular, in situations where the channel's device_obj has + * not been allocated/initialized yet. + */ + u16 device_id; + /* * We must offload the handling of the primary/sub channels * from the single-threaded vmbus_connection.work_queue to -- cgit v1.2.3