summaryrefslogtreecommitdiff
path: root/include/linux/ceph/ceph_fs.h
AgeCommit message (Collapse)Author
2017-09-06ceph: more accurate statfsDouglas Fuller
Improve accuracy of statfs reporting for Ceph filesystems comprising exactly one data pool. In this case, the Ceph monitor can now report the space usage for the single data pool instead of the global data for the entire Ceph cluster. Include support for this message in mon_client and leverage it in ceph/super. Signed-off-by: Douglas Fuller <dfuller@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-09-06ceph: new cap message flags indicate if there is pending capsnapYan, Zheng
These flags tell mds if there is pending capsnap explicitly. Without this explicit notification, mds can only conclude if client has pending capsnap. The method mds use is inefficient and error-prone. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-07-07libceph: respect RADOS_BACKOFF backoffsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-05-04ceph: fix file open flags on ppc64Alexander Graf
The file open flags (O_foo) are platform specific and should never go out to an interface that is not local to the system. Unfortunately these flags have leaked out onto the wire in the cephfs implementation. That lead to bogus flags getting transmitted on ppc64. This patch converts the kernel view of flags to the ceph view of file open flags. Fixes: 124e68e74 ("ceph: file operations") Signed-off-by: Alexander Graf <agraf@suse.de> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-05-04ceph: make seeky readdir more efficientYan, Zheng
Current cephfs client uses string to indicate start position of readdir. The string is last entry of previous readdir reply. This approach does not work for seeky readdir because we can not easily convert the new postion to a string. For seeky readdir, mds needs to return dentries from the beginning. Client keeps retrying if the reply does not contain the dentry it wants. In current version of ceph, mds sorts CDentry in its cache in hash order. Client also uses dentry hash to compose dir postion. For seeky readdir, if client passes the hash part of dir postion to mds. mds can avoid replying useless dentries. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12ceph: add flags parameter to send_cap_msgJeff Layton
Add a flags parameter to send_cap_msg, so we can request expedited service from the MDS when we know we'll be waiting on the result. Set that flag in the case of try_flush_caps. The callers of that function generally wait synchronously on the result, so it's beneficial to ask the server to expedite it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-10-03ceph: handle CEPH_SESSION_REJECT messageYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-08-24libceph: support for blacklisting clientsDouglas Fuller
Reuse ceph_mon_generic_request infrastructure for sending monitor commands. In particular, add support for 'blacklist add' to prevent other, non-responsive clients from making further updates. Signed-off-by: Douglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-07-28libceph: fsmap.user subscription supportYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28ceph: reduce i_nr_by_mode array sizeYan, Zheng
Track usage count for individual fmode bit. This can reduce the array size by half. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-07-28libceph: rados pool namespace supportYan, Zheng
Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: Yan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-07-28libceph: define new ceph_file_layout structureYan, Zheng
Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: using hash value to compose dentry offsetYan, Zheng
If MDS sorts dentries in dirfrag in hash order, we use hash value to compose dentry offset. dentry offset is: (0xff << 52) | ((24 bits hash) << 28) | (the nth entry hash hash collision) This offset is stable across directory fragmentation. This alos means there is no need to reset readdir offset if directory get fragmented in the middle of readdir. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26ceph: define 'end/complete' in readdir reply as bit flagsYan, Zheng
Set a flag in readdir request, which indicates that client interprets 'end/complete' as bit flags. So that mds can reply additional flags in readdir reply. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26libceph: support for subscribing to "mdsmap.<id>" mapsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26libceph, rbd: ceph_osd_linger_request, watch/notify v2Ilya Dryomov
This adds support and switches rbd to a new, more reliable version of watch/notify protocol. As with the OSD client update, this is mostly about getting the right structures linked into the right places so that reconnects are properly sent when needed. watch/notify v2 also requires sending regular pings to the OSDs - send_linger_ping(). A major change from the old watch/notify implementation is the introduction of ceph_osd_linger_request - linger requests no longer piggy back on ceph_osd_request. ceph_osd_event has been merged into ceph_osd_linger_request. All the details are now hidden within libceph, the interface consists of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack(). ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep the lifetime management simple. ceph_osdc_notify_ack() accepts an optional data payload, which is relayed back to the notifier. Portions of this patch are loosely based on work by Douglas Fuller <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25ceph: fix security xattr deadlockYan, Zheng
When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25libceph: revamp subs code, switch to SUBSCRIBE2 protocolIlya Dryomov
It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-04-22ceph: rename snapshot supportYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: handle SESSION_FORCE_RO messageYan, Zheng
mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19libceph: nuke pool op infrastructureIlya Dryomov
On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <sage@newdream.net> wrote: > On Mon, 22 Dec 2014, Ilya Dryomov wrote: >> Actually, pool op stuff has been unused for over two years - looks like >> it was added for rbd create_snap and that got ripped out in 2012. It's >> unlikely we'd ever need to manage pools or snaps from the kernel client >> so I think it makes sense to nuke it. Sage? > > Yep! Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17ceph: use getattr request to fetch inline dataYan, Zheng
Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data in getattr reply will be copied to the page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: add inline data to pagecacheYan, Zheng
Request reply and cap message can contain inline data. add inline data to the page cache if there is Fc cap. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: fix file lock interruptionYan, Zheng
When a lock operation is interrupted, current code sends a unlock request to MDS to undo the lock operation. This method does not work as expected because the unlock request can drop locks that have already been acquired. The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR requests to interrupt blocked file lock request. These requests do not drop locks that have alread been acquired, they only interrupt blocked file lock request. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-06-06ceph: update inode fields according to issued capsYan, Zheng
Cap message and request reply from non-auth MDS may carry stale information (corresponding locks are in LOCK states) even they have the newest inode version. So client should update inode fields according to issued caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-04ceph: use fl->fl_file as owner identifier of flock and posix lockYan, Zheng
flock and posix lock should use fl->fl_file instead of process ID as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner is usually equal to fl->fl_file, but it also can be a customized value). The process ID of who holds the lock is just for F_GETLK fcntl(2). The fix is rename the 'pid' fields of struct ceph_mds_request_args and struct ceph_filelock to 'owner', rename 'pid_namespace' fields to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages. We also set the most significant bit of the 'owner' field. MDS can use that bit to distinguish between old and new clients. The MDS counterpart of this patch modifies the flock code to not take the 'pid_namespace' into consideration when checking conflict locks. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-04-03ceph: add get_name() NFS export callbackYan, Zheng
Use the newly introduced LOOKUPNAME MDS request to connect child inode to its parent directory. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-17ceph: remove xattr when null value is given to setxattr()Yan, Zheng
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE to distinguish null value case from the zero-length value case. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-01-27libceph: move ceph_file_layout helpers to ceph_fs.hIlya Dryomov
Move ceph_file_layout helper macros and inline functions to ceph_fs.h. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-21ceph: remove exported caps when handling cap import messageYan, Zheng
Version 3 cap import message includes the ID of the exported caps. It allow us to remove the exported caps if we still haven't received the corresponding cap export message. We remove the exported caps because they are stale, keeping them can compromise consistence. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-01-21ceph: handle session flush messageYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2013-02-18libceph: update ceph_fs.hAlex Elder
Update most of "include/linux/ceph/ceph_fs.h" to match its user space counterpart in "src/include/ceph_fs.h" in the ceph tree. Everything that has changed is either: - added definitions (therefore no real effect on existing code) - deleting unused symbols - added or revised comments There were some differences between the struct definitions for ceph_mon_subscribe_item and the open field of ceph_mds_request_args; those differences remain. This and the next commit resolve: http://tracker.ceph.com/issues/4165 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2012-10-02UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel ↵David Howells
system headers Convert #include "..." to #include <path/...> in kernel system headers. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-07-30libceph: move feature bits to separate headerSage Weil
This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-05-07ceph: drop support for preferred_osd pgsSage Weil
This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
2011-05-24ceph: use LOOKUPINO to make unconnected nfs fh more reliableSage Weil
If we are unable to locate an inode by ino, ask the MDS using the new LOOKUPINO command. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21ceph: update common header filesYehuda Sadeh
This updates the common header files used by the different ceph related modules. Specifically it adds definitions required by the rbd watch/notify feature. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-01-12ceph: add dir_layout to inodeSage Weil
Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl.Greg Farnum
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: factor out libceph from Ceph file systemYehuda Sadeh
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>