From 87354e5de04fe727227ff619af164202adcfa4d4 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 6 Jul 2017 07:02:21 -0400 Subject: buffer: set errors in mapping at the time that the error occurs I noticed on xfs that I could still sometimes get back an error on fsync on a fd that was opened after the error condition had been cleared. The problem is that the buffer code sets the write_io_error flag and then later checks that flag to set the error in the mapping. That flag perisists for quite a while however. If the file is later opened with O_TRUNC, the buffers will then be invalidated and the mapping's error set such that a subsequent fsync will return error. I think this is incorrect, as there was no writeback between the open and fsync. Add a new mark_buffer_write_io_error operation that sets the flag and the error in the mapping at the same time. Replace all calls to set_buffer_write_io_error with mark_buffer_write_io_error, and remove the places that check this flag in order to set the error in the mapping. This sets the error in the mapping earlier, at the time that it's first detected. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Reviewed-by: Carlos Maiolino --- include/linux/buffer_head.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index bd029e52ef5e..e0abeba3ced7 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -149,6 +149,7 @@ void buffer_check_dirty_writeback(struct page *page, */ void mark_buffer_dirty(struct buffer_head *bh); +void mark_buffer_write_io_error(struct buffer_head *bh); void init_buffer(struct buffer_head *, bh_end_io_t *, void *); void touch_buffer(struct buffer_head *bh); void set_bh_page(struct buffer_head *bh, -- cgit v1.2.3 From 76341cabbdad65c10a4162e9dfa82a6342afc02f Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 6 Jul 2017 07:02:22 -0400 Subject: jbd2: don't clear and reset errors after waiting on writeback Resetting this flag is almost certainly racy, and will be problematic with some coming changes. Make filemap_fdatawait_keep_errors return int, but not clear the flag(s). Have jbd2 call it instead of filemap_fdatawait and don't attempt to re-set the error flag if it fails. Reviewed-by: Jan Kara Reviewed-by: Carlos Maiolino Signed-off-by: Jeff Layton --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/fs.h b/include/linux/fs.h index 803e5a9b2654..8ac8df1b3550 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2514,7 +2514,7 @@ extern int write_inode_now(struct inode *, int); extern int filemap_fdatawrite(struct address_space *); extern int filemap_flush(struct address_space *); extern int filemap_fdatawait(struct address_space *); -extern void filemap_fdatawait_keep_errors(struct address_space *); +extern int filemap_fdatawait_keep_errors(struct address_space *mapping); extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); -- cgit v1.2.3 From 84cbadadc6eafc4798513773a2c8fce37dcd2fb8 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 6 Jul 2017 07:02:24 -0400 Subject: lib: add errseq_t type and infrastructure for handling it An errseq_t is a way of recording errors in one place, and allowing any number of "subscribers" to tell whether an error has been set again since a previous time. It's implemented as an unsigned 32-bit value that is managed with atomic operations. The low order bits are designated to hold an error code (max size of MAX_ERRNO). The upper bits are used as a counter. The API works with consumers sampling an errseq_t value at a particular point in time. Later, that value can be used to tell whether new errors have been set since that time. Note that there is a 1 in 512k risk of collisions here if new errors are being recorded frequently, since we have so few bits to use as a counter. To mitigate this, one bit is used as a flag to tell whether the value has been sampled since a new value was recorded. That allows us to avoid bumping the counter if no one has sampled it since it was last bumped. Later patches will build on this infrastructure to change how writeback errors are tracked in the kernel. Signed-off-by: Jeff Layton Reviewed-by: NeilBrown Reviewed-by: Jan Kara --- include/linux/errseq.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 include/linux/errseq.h (limited to 'include') diff --git a/include/linux/errseq.h b/include/linux/errseq.h new file mode 100644 index 000000000000..9e0d444ac88d --- /dev/null +++ b/include/linux/errseq.h @@ -0,0 +1,19 @@ +#ifndef _LINUX_ERRSEQ_H +#define _LINUX_ERRSEQ_H + +/* See lib/errseq.c for more info */ + +typedef u32 errseq_t; + +errseq_t __errseq_set(errseq_t *eseq, int err); +static inline void errseq_set(errseq_t *eseq, int err) +{ + /* Optimize for the common case of no error */ + if (unlikely(err)) + __errseq_set(eseq, err); +} + +errseq_t errseq_sample(errseq_t *eseq); +int errseq_check(errseq_t *eseq, errseq_t since); +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); +#endif -- cgit v1.2.3 From 5660e13d2fd6af1903d4b0b98020af95ca2d638a Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 6 Jul 2017 07:02:25 -0400 Subject: fs: new infrastructure for writeback error handling and reporting Most filesystems currently use mapping_set_error and filemap_check_errors for setting and reporting/clearing writeback errors at the mapping level. filemap_check_errors is indirectly called from most of the filemap_fdatawait_* functions and from filemap_write_and_wait*. These functions are called from all sorts of contexts to wait on writeback to finish -- e.g. mostly in fsync, but also in truncate calls, getattr, etc. The non-fsync callers are problematic. We should be reporting writeback errors during fsync, but many places spread over the tree clear out errors before they can be properly reported, or report errors at nonsensical times. If I get -EIO on a stat() call, there is no reason for me to assume that it is because some previous writeback failed. The fact that it also clears out the error such that a subsequent fsync returns 0 is a bug, and a nasty one since that's potentially silent data corruption. This patch adds a small bit of new infrastructure for setting and reporting errors during address_space writeback. While the above was my original impetus for adding this, I think it's also the case that current fsync semantics are just problematic for userland. Most applications that call fsync do so to ensure that the data they wrote has hit the backing store. In the case where there are multiple writers to the file at the same time, this is really hard to determine. The first one to call fsync will see any stored error, and the rest get back 0. The processes with open fds may not be associated with one another in any way. They could even be in different containers, so ensuring coordination between all fsync callers is not really an option. One way to remedy this would be to track what file descriptor was used to dirty the file, but that's rather cumbersome and would likely be slow. However, there is a simpler way to improve the semantics here without incurring too much overhead. This set adds an errseq_t to struct address_space, and a corresponding one is added to struct file. Writeback errors are recorded in the mapping's errseq_t, and the one in struct file is used as the "since" value. This changes the semantics of the Linux fsync implementation such that applications can now use it to determine whether there were any writeback errors since fsync(fd) was last called (or since the file was opened in the case of fsync having never been called). Note that those writeback errors may have occurred when writing data that was dirtied via an entirely different fd, but that's the case now with the current mapping_set_error/filemap_check_error infrastructure. This will at least prevent you from getting a false report of success. The new behavior is still consistent with the POSIX spec, and is more reliable for application developers. This patch just adds some basic infrastructure for doing this, and ensures that the f_wb_err "cursor" is properly set when a file is opened. Later patches will change the existing code to use this new infrastructure for reporting errors at fsync time. Signed-off-by: Jeff Layton Reviewed-by: Jan Kara --- include/linux/fs.h | 60 +++++++++++++++++++++++++++++++++++++++++- include/trace/events/filemap.h | 57 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/fs.h b/include/linux/fs.h index 8ac8df1b3550..78b5c2901712 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -30,7 +30,7 @@ #include #include #include - +#include #include #include @@ -392,6 +392,7 @@ struct address_space { gfp_t gfp_mask; /* implicit gfp mask for allocations */ struct list_head private_list; /* ditto */ void *private_data; /* ditto */ + errseq_t wb_err; } __attribute__((aligned(sizeof(long)))); /* * On most architectures that alignment is already the case; but @@ -868,6 +869,7 @@ struct file { struct list_head f_tfile_llink; #endif /* #ifdef CONFIG_EPOLL */ struct address_space *f_mapping; + errseq_t f_wb_err; } __attribute__((aligned(4))); /* lest something weird decides that 2 is OK */ struct file_handle { @@ -2526,6 +2528,62 @@ extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); +extern void __filemap_set_wb_err(struct address_space *mapping, int err); +extern int __must_check file_check_and_advance_wb_err(struct file *file); +extern int __must_check file_write_and_wait_range(struct file *file, + loff_t start, loff_t end); + +/** + * filemap_set_wb_err - set a writeback error on an address_space + * @mapping: mapping in which to set writeback error + * @err: error to be set in mapping + * + * When writeback fails in some way, we must record that error so that + * userspace can be informed when fsync and the like are called. We endeavor + * to report errors on any file that was open at the time of the error. Some + * internal callers also need to know when writeback errors have occurred. + * + * When a writeback error occurs, most filesystems will want to call + * filemap_set_wb_err to record the error in the mapping so that it will be + * automatically reported whenever fsync is called on the file. + * + * FIXME: mention FS_* flag here? + */ +static inline void filemap_set_wb_err(struct address_space *mapping, int err) +{ + /* Fastpath for common case of no error */ + if (unlikely(err)) + __filemap_set_wb_err(mapping, err); +} + +/** + * filemap_check_wb_error - has an error occurred since the mark was sampled? + * @mapping: mapping to check for writeback errors + * @since: previously-sampled errseq_t + * + * Grab the errseq_t value from the mapping, and see if it has changed "since" + * the given value was sampled. + * + * If it has then report the latest error set, otherwise return 0. + */ +static inline int filemap_check_wb_err(struct address_space *mapping, + errseq_t since) +{ + return errseq_check(&mapping->wb_err, since); +} + +/** + * filemap_sample_wb_err - sample the current errseq_t to test for later errors + * @mapping: mapping to be sampled + * + * Writeback errors are always reported relative to a particular sample point + * in the past. This function provides those sample points. + */ +static inline errseq_t filemap_sample_wb_err(struct address_space *mapping) +{ + return errseq_sample(&mapping->wb_err); +} + extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync); extern int vfs_fsync(struct file *file, int datasync); diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h index 42febb6bc1d5..ff91325b8123 100644 --- a/include/trace/events/filemap.h +++ b/include/trace/events/filemap.h @@ -10,6 +10,7 @@ #include #include #include +#include DECLARE_EVENT_CLASS(mm_filemap_op_page_cache, @@ -52,6 +53,62 @@ DEFINE_EVENT(mm_filemap_op_page_cache, mm_filemap_add_to_page_cache, TP_ARGS(page) ); +TRACE_EVENT(filemap_set_wb_err, + TP_PROTO(struct address_space *mapping, errseq_t eseq), + + TP_ARGS(mapping, eseq), + + TP_STRUCT__entry( + __field(unsigned long, i_ino) + __field(dev_t, s_dev) + __field(errseq_t, errseq) + ), + + TP_fast_assign( + __entry->i_ino = mapping->host->i_ino; + __entry->errseq = eseq; + if (mapping->host->i_sb) + __entry->s_dev = mapping->host->i_sb->s_dev; + else + __entry->s_dev = mapping->host->i_rdev; + ), + + TP_printk("dev=%d:%d ino=0x%lx errseq=0x%x", + MAJOR(__entry->s_dev), MINOR(__entry->s_dev), + __entry->i_ino, __entry->errseq) +); + +TRACE_EVENT(file_check_and_advance_wb_err, + TP_PROTO(struct file *file, errseq_t old), + + TP_ARGS(file, old), + + TP_STRUCT__entry( + __field(struct file *, file); + __field(unsigned long, i_ino) + __field(dev_t, s_dev) + __field(errseq_t, old) + __field(errseq_t, new) + ), + + TP_fast_assign( + __entry->file = file; + __entry->i_ino = file->f_mapping->host->i_ino; + if (file->f_mapping->host->i_sb) + __entry->s_dev = + file->f_mapping->host->i_sb->s_dev; + else + __entry->s_dev = + file->f_mapping->host->i_rdev; + __entry->old = old; + __entry->new = file->f_wb_err; + ), + + TP_printk("file=%p dev=%d:%d ino=0x%lx old=0x%x new=0x%x", + __entry->file, MAJOR(__entry->s_dev), + MINOR(__entry->s_dev), __entry->i_ino, __entry->old, + __entry->new) +); #endif /* _TRACE_FILEMAP_H */ /* This part must be outside protection */ -- cgit v1.2.3 From 8ed1e46aaf1bec6a12f4c89637f2c3ef4c70f18e Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 6 Jul 2017 07:02:26 -0400 Subject: mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error When a writeback error occurs, we want later callers to be able to pick up that fact when they go to wait on that writeback to complete. Traditionally, we've used AS_EIO/AS_ENOSPC flags to track that, but that's problematic since only one "checker" will be informed when an error occurs. In later patches, we're going to want to convert many of these callers to check for errors since a well-defined point in time. For now, ensure that we can handle both sorts of checks by both setting errors in both places when there is a writeback failure. Signed-off-by: Jeff Layton --- include/linux/pagemap.h | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 316a19f6b635..28acc94e0f81 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -28,14 +28,33 @@ enum mapping_flags { AS_NO_WRITEBACK_TAGS = 5, }; +/** + * mapping_set_error - record a writeback error in the address_space + * @mapping - the mapping in which an error should be set + * @error - the error to set in the mapping + * + * When writeback fails in some way, we must record that error so that + * userspace can be informed when fsync and the like are called. We endeavor + * to report errors on any file that was open at the time of the error. Some + * internal callers also need to know when writeback errors have occurred. + * + * When a writeback error occurs, most filesystems will want to call + * mapping_set_error to record the error in the mapping so that it can be + * reported when the application calls fsync(2). + */ static inline void mapping_set_error(struct address_space *mapping, int error) { - if (unlikely(error)) { - if (error == -ENOSPC) - set_bit(AS_ENOSPC, &mapping->flags); - else - set_bit(AS_EIO, &mapping->flags); - } + if (likely(!error)) + return; + + /* Record in wb_err for checkers using errseq_t based tracking */ + filemap_set_wb_err(mapping, error); + + /* Record it in flags for now, for legacy callers */ + if (error == -ENOSPC) + set_bit(AS_ENOSPC, &mapping->flags); + else + set_bit(AS_EIO, &mapping->flags); } static inline void mapping_set_unevictable(struct address_space *mapping) -- cgit v1.2.3