From 6b06546206868f723f2061d703a3c3c378dcbf4c Mon Sep 17 00:00:00 2001 From: "Dennis Zhou (Facebook)" Date: Fri, 31 Aug 2018 16:22:42 -0400 Subject: Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" This reverts commit 4c6994806f708559c2812b73501406e21ae5dcd0. Destroying blkgs is tricky because of the nature of the relationship. A blkg should go away when either a blkcg or a request_queue goes away. However, blkg's pin the blkcg to ensure they remain valid. To break this cycle, when a blkcg is offlined, blkgs put back their css ref. This eventually lets css_free() get called which frees the blkcg. The above commit (4c6994806f70) breaks this order of events by trying to destroy blkgs in css_free(). As the blkgs still hold references to the blkcg, css_free() is never called. The race between blkcg_bio_issue_check() and cgroup_rmdir() will be addressed in the following patch by delaying destruction of a blkg until all writeback associated with the blkcg has been finished. Fixes: 4c6994806f70 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()") Reviewed-by: Josef Bacik Signed-off-by: Dennis Zhou Cc: Jiufei Xue Cc: Joseph Qi Cc: Tejun Heo Cc: Jens Axboe Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 34aec30e06c7..1615cdd4c797 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -89,7 +89,6 @@ struct blkg_policy_data { /* the blkg and policy id this per-policy data belongs to */ struct blkcg_gq *blkg; int plid; - bool offline; }; /* -- cgit v1.2.3 From 59b57717fff8b562825d9d25e0180ad7e8048ca9 Mon Sep 17 00:00:00 2001 From: "Dennis Zhou (Facebook)" Date: Fri, 31 Aug 2018 16:22:43 -0400 Subject: blkcg: delay blkg destruction until after writeback has finished Currently, blkcg destruction relies on a sequence of events: 1. Destruction starts. blkcg_css_offline() is called and blkgs release their reference to the blkcg. This immediately destroys the cgwbs (writeback). 2. With blkgs giving up their reference, the blkcg ref count should become zero and eventually call blkcg_css_free() which finally frees the blkcg. Jiufei Xue reported that there is a race between blkcg_bio_issue_check() and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent on the completion of all writeback associated with the blkcg. A count of the number of cgwbs is maintained and once that goes to zero, blkg destruction can follow. This should prevent premature blkg destruction related to writeback. The new process for blkcg cleanup is as follows: 1. Destruction starts. blkcg_css_offline() is called which offlines writeback. Blkg destruction is delayed on the cgwb_refcnt count to avoid punting potentially large amounts of outstanding writeback to root while maintaining any ongoing policies. Here, the base cgwb_refcnt is put back. 2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called and handles destruction of blkgs. This is where the css reference held by each blkg is released. 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called. This finally frees the blkg. It seems in the past blk-throttle didn't do the most understandable things with taking data from a blkg while associating with current. So, the simplification and unification of what blk-throttle is doing caused this. Fixes: 08e18eab0c579 ("block: add bi_blkg to the bio for cgroups") Reviewed-by: Josef Bacik Signed-off-by: Dennis Zhou Cc: Jiufei Xue Cc: Joseph Qi Cc: Tejun Heo Cc: Josef Bacik Cc: Jens Axboe Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 1615cdd4c797..6d766a19f2bb 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -56,6 +56,7 @@ struct blkcg { struct list_head all_blkcgs_node; #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_list; + refcount_t cgwb_refcnt; #endif }; @@ -386,6 +387,49 @@ static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd) return cpd ? cpd->blkcg : NULL; } +extern void blkcg_destroy_blkgs(struct blkcg *blkcg); + +#ifdef CONFIG_CGROUP_WRITEBACK + +/** + * blkcg_cgwb_get - get a reference for blkcg->cgwb_list + * @blkcg: blkcg of interest + * + * This is used to track the number of active wb's related to a blkcg. + */ +static inline void blkcg_cgwb_get(struct blkcg *blkcg) +{ + refcount_inc(&blkcg->cgwb_refcnt); +} + +/** + * blkcg_cgwb_put - put a reference for @blkcg->cgwb_list + * @blkcg: blkcg of interest + * + * This is used to track the number of active wb's related to a blkcg. + * When this count goes to zero, all active wb has finished so the + * blkcg can continue destruction by calling blkcg_destroy_blkgs(). + * This work may occur in cgwb_release_workfn() on the cgwb_release + * workqueue. + */ +static inline void blkcg_cgwb_put(struct blkcg *blkcg) +{ + if (refcount_dec_and_test(&blkcg->cgwb_refcnt)) + blkcg_destroy_blkgs(blkcg); +} + +#else + +static inline void blkcg_cgwb_get(struct blkcg *blkcg) { } + +static inline void blkcg_cgwb_put(struct blkcg *blkcg) +{ + /* wb isn't being accounted, so trigger destruction right away */ + blkcg_destroy_blkgs(blkcg); +} + +#endif + /** * blkg_path - format cgroup path of blkg * @blkg: blkg of interest -- cgit v1.2.3