Creating sparse block devices
I've not found much material online with regards to creating sparse filesystems, and it's a useful thing if you're doing filesystem testing, or just want to pretend that you have more storage than you do to impress your friends (that's fun). The concept of doing this is to make it appear as though you have a larger block device than you actually do. How you accomplish this on Linux is a creative use of device-mapper. First, a little bit of an introduction to device-mapper is in order. Device mapper is a way to concatenate and extend block devices in ways not before possible. It is the basis of how LVM works (using a linear map by default). You can see the tables that LVM has created for you by typing 'dmsetup table'. On my workstation for example, it outputs the following:
[root@rugrat ~]# dmsetup table bootvg-usr: 0 20971520 linear 8:2 4194688 bootvg-var: 0 4194304 linear 8:2 25166208 bootvg-swap: 0 8388608 linear 8:2 147521920 bootvg-root: 0 2097152 linear 8:2 384 bootvg-data: 0 118161408 linear 8:2 29360512 bootvg-data: 118161408 65536 linear 8:2 155910528 bootvg-tmp: 0 2097152 linear 8:2 2097536A little explanation of how these tables are formatted. The device mapper tables are expressed in terms of 512-byte sectors. The first number is the starting position within the mapped device, and the second number is the length (of that segment of the device). The third argument is the name of the target (linear in this case). All of the remaining arguments on the line are target specific options. There are various targets that you can use with device-mapper, these are actually kernel modules that get loaded in at runtime. There are several targets available, including: linear - specifies linear regions on disks stripe - specifies striped devices mirror - sets up mirrored devices zero - creates a device of xero's (similar to /dev/zero but as a block device) multipath - used for setting up multiple paths to one device (for example on a Fibre Channel SAN) There are other targets. but that should get you started for now. The linear target has two target-specific options - the device that you're referring to, and the starting sector within the device. Note that while it's not done here, you can actually 'stack' dm devices - this 9is where it comes to be quite powerful. In all of these examples, you see that we're referring to 8:2, which you can decode by looking at the /dev directory and finding the device with major 8, minor 2 (/dev/sda2 in this case). Where this gets interesting is you notice that there are two entries for bootvg-data above. One encompassing sector 0 through 118161408 of bootvg-data, and the other encompassing sectors 118161408, continuing for a length of 65536 sectors. What this indicates is that this volume has been expanded in the past. Also, if the volume were spread out over multiple disks, then there would be multiple tables for it. Now that we've got basic device-mapper theory down, let's get into snapshots. Again, device mapper is the basis for LVM snapshots. A snapshot is a point-in-time copy of a volume. The interesting thing about LVM snapshots as compared to some hardware array vendor's snapshots is that BOTH the source and snapshot volumes are read/write. We'll see how this can happen here. When an LVM snapshot is created, four device-mapper tables are created by LVM. Tehy are, in this order and name (for a volume called base in the volume group vol0, and the snapshot volume is called snap): vol0-base-real: the original mapping of the volume vol0-snap-cow: the copy-on-write device, this is a table that specifies physical storage vol0-snap: the user-visible snapshot volume, dm target is snapshot, with the origin being the 'real' volume above, and the COW device being the COW device above. vol0-base: The user-visible base volume, target is snapshot-origin, with the backing device of the 'real' volume created above. All reads come from the backing device specified for the snapshot, and all writes go to the COW device. Now that we have a firm understanding of the basic facilities of device-mapper, let's explore how to create a sparse block device. The concept here is to combine two types of devices - zero and snapshot. First, we create a huge chunk of just zeros. So let's create a 15TiB GFS filesystem with only 2GB of backing storage (the reason for not using ext3 in this case is due to an architectural advantage of GFS in this case, which is that it does not allocate inode tables at filesystem creation time. Therefore large filesystems don't take hours and more than 2GB to create). First, we need to get the number of 512-byte sectors that 15TiB makes up:
[root@dhcp-144 ~]# echo $[15 * (2**40) /512] 32212254720Having that, we can now create a 15TiB block device, and take a snapshot of that device, using a real 2GB logical volume that I created earlier as backing for that device. Now that we know the size, we can create the device:
[root@dhcp-144 ~]# echo "0 32212254720 zero" | dmsetup create zero [root@dhcp-144 ~]# echo "0 32212254720 snapshot /dev/mapper/zero /dev/vg0/backing P 16" | dmsetup create gfs-hugeNote that the above commands took a fraction of a second, and now I have a 15TiB block device that I can write just 2GB of data to. Next, let's create a GFS filesystem on this device:
[root@dhcp-144 ~]# gfs_mkfs -p lock_nolock -j 2 -t cluster2:temp /dev/mapper/gfs-huge This will destroy any data on /dev/mapper/gfs-huge. Are you sure you want to proceed? [y/n] y Device: /dev/mapper/gfs-huge Blocksize: 4096 Filesystem Size: 4026174448 Journals: 2 Resource Groups: 15360 Locking Protocol: lock_nolock Lock Table: cluster2:temp Syncing... All Done [root@dhcp-144 ~]# mount -t gfs /dev/mapper/gfs-huge /mntSo now the GFS filesystem is created and mounted, let's do a df on it, and see that we really have a 15TiB filesystem:
[root@dhcp-144 ~]# df -h /mnt Filesystem Size Used Avail Use% Mounted on /dev/mapper/gfs-huge 15T 1.5M 15T 1% /mntAs you can see, the system believes that I've got a 15TiB filesystem mounted up!
Labels: is