summaryrefslogtreecommitdiff
path: root/fs/btrfs/print-tree.c
AgeCommit message (Collapse)Author
2009-06-10Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)Yan Zheng
This commit introduces a new kind of back reference for btrfs metadata. Once a filesystem has been mounted with this commit, IT WILL NO LONGER BE MOUNTABLE BY OLDER KERNELS. When a tree block in subvolume tree is cow'd, the reference counts of all extents it points to are increased by one. At transaction commit time, the old root of the subvolume is recorded in a "dead root" data structure, and the btree it points to is later walked, dropping reference counts and freeing any blocks where the reference count goes to 0. The increments done during cow and decrements done after commit cancel out, and the walk is a very expensive way to go about freeing the blocks that are no longer referenced by the new btree root. This commit reduces the transaction overhead by avoiding the need for dead root records. When a non-shared tree block is cow'd, we free the old block at once, and the new block inherits old block's references. When a tree block with reference count > 1 is cow'd, we increase the reference counts of all extents the new block points to by one, and decrease the old block's reference count by one. This dead tree avoidance code removes the need to modify the reference counts of lower level extents when a non-shared tree block is cow'd. But we still need to update back ref for all pointers in the block. This is because the location of the block is recorded in the back ref item. We can solve this by introducing a new type of back ref. The new back ref provides information about pointer's key, level and in which tree the pointer lives. This information allow us to find the pointer by searching the tree. The shortcoming of the new back ref is that it only works for pointers in tree blocks referenced by their owner trees. This is mostly a problem for snapshots, where resolving one of these fuzzy back references would be O(number_of_snapshots) and quite slow. The solution used here is to use the fuzzy back references in the common case where a given tree block is only referenced by one root, and use the full back references when multiple roots have a reference on a given block. This commit adds per subvolume red-black tree to keep trace of cached inodes. The red-black tree helps the balancing code to find cached inodes whose inode numbers within a given range. This commit improves the balancing code by introducing several data structures to keep the state of balancing. The most important one is the back ref cache. It caches how the upper level tree blocks are referenced. This greatly reduce the overhead of checking back ref. The improved balancing code scales significantly better with a large number of snapshots. This is a very large commit and was written in a number of pieces. But, they depend heavily on the disk format change and were squashed together to make sure git bisect didn't end up in a bad state wrt space balancing or the format change. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-05Btrfs: Fix checkpatch.pl warningsChris Mason
There were many, most are fixed now. struct-funcs.c generates some warnings but these are bogus. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-10-29Btrfs: Add zlib compression supportChris Mason
This is a large change for adding compression on reading and writing, both for inline and regular extents. It does some fairly large surgery to the writeback paths. Compression is off by default and enabled by mount -o compress. Even when the -o compress mount option is not used, it is possible to read compressed extents off the disk. If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later. * While finding delalloc extents, the pages are locked before being sent down to the delalloc handler. This allows the delalloc handler to do complex things such as cleaning the pages, marking them writeback and starting IO on their behalf. * Inline extents are inserted at delalloc time now. This allows us to compress the data before inserting the inline extent, and it allows us to insert an inline extent that spans multiple pages. * All of the in-memory extent representations (extent_map.c, ordered-data.c etc) are changed to record both an in-memory size and an on disk size, as well as a flag for compression. From a disk format point of view, the extent pointers in the file are changed to record the on disk size of a given extent and some encoding flags. Space in the disk format is allocated for compression encoding, as well as encryption and a generic 'other' field. Neither the encryption or the 'other' field are currently used. In order to limit the amount of data read for a single random read in the file, the size of a compressed extent is limited to 128k. This is a software only limit, the disk format supports u64 sized compressed extents. In order to limit the ram consumed while processing extents, the uncompressed size of a compressed extent is limited to 256k. This is a software only limit and will be subject to tuning later. Checksumming is still done on compressed extents, and it is done on the uncompressed version of the data. This way additional encodings can be layered on without having to figure out which encoding to checksum. Compression happens at delalloc time, which is basically singled threaded because it is usually done by a single pdflush thread. This makes it tricky to spread the compression load across all the cpus on the box. We'll have to look at parallel pdflush walks of dirty inodes at a later time. Decompression is hooked into readpages and it does spread across CPUs nicely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-10-09Btrfs: Remove offset field from struct btrfs_extent_refYan Zheng
The offset field in struct btrfs_extent_ref records the position inside file that file extent is referenced by. In the new back reference system, tree leaves holding references to file extent are recorded explicitly. We can scan these tree leaves very quickly, so the offset field is not required. This patch also makes the back reference system check the objectid when extents are in deleting. Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2008-09-25Btrfs: Full back reference supportZheng Yan
This patch makes the back reference system to explicit record the location of parent node for all types of extents. The location of parent node is placed into the offset field of backref key. Every time a tree block is balanced, the back references for the affected lower level extents are updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: implement memory reclaim for leaf reference cacheYan
The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Pass down the expected generation number when reading tree blocksChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add chunk uuids and update multi-device back referencesChris Mason
Block headers now store the chunk tree uuid Chunk items records the device uuid for each stripes Device extent items record better back refs to the chunk tree Block groups record better back refs to the chunk tree The chunk tree format has also changed. The objectid of BTRFS_CHUNK_ITEM_KEY used to be the logical offset of the chunk. Now it is a chunk tree id, with the logical offset being stored in the offset field of the key. This allows a single chunk tree to record multiple logical address spaces, upping the number of bytes indexed by a chunk tree from 2^64 to 2^128. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Move device information into the super block so it can be scannedChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add support for multiple devices per filesystemChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add back pointers from extents to the btree or file referencing themChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25xattr support for btrfsJosef Bacik
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Allow tree blocks larger than the page sizeChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Create extent_buffer interface for large blocksizesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-07-11Btrfs: trivial include fixupsZach Brown
Almost none of the files including module.h need to do so, remove them. Include sched.h in extent-tree.c to silence a warning about cond_resched() being undeclared. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-18Subject: Rework btrfs_file_write to only allocate while page locks are heldChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12Btrfs: add GPLv2Chris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-06-12Btrfs: printk fixesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-05-02Btrfs: fix page cache memory leakChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-26Btrfs: start of block group codeChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-18Btrfs: working file_write, reorganized key flagsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-10Btrfs: drop owner and parentidChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-10Btrfs: drop the inode map treeChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-04-06Btrfs: start of support for many FS volumesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-22Mountable btrfs, with readdirChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-21Btrfs: initial move to kernel module landChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-20Btrfs: change dir-test to insert inode_itemsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-20Btrfs: Add inode map, and the start of file extent itemsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-16Btrfs: add a name_len to dir items, reorder keyChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-15Btrfs: directory testing code and dir item fixesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-15Btrfs: Use a chunk of the key flags to record the item type.Chris Mason
Add (untested and simple) directory item code Fix comp_keys to use the new key ordering Add btrfs_insert_empty_item Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-14Btrfs: reorder key offset and flagsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-14Btrfs: variable block size supportChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-14Btrfs: add leaf data casting helperChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-13Btrfs: Change the super to point to a tree of trees to enable persistent ↵Chris Mason
snapshots Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-13rename funcs and structs to btrfsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-13Btrfs: node->blockptrs endian fixesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-12Btrfs: struct item endian fixesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-12Btrfs: get/set for struct header fieldsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-03-06Btrfs: Fixup reference counting on cowsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-02-26Btrfs: more 32 bit cleanupsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-02-26Btrfs: 32bit cleanupsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-02-26Btrfs: u64 cleanupsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2007-02-24Btrfs: Break up ctree.c a littleChris Mason
Extent fixes Signed-off-by: Chris Mason <chris.mason@oracle.com>