Well, I've tried blogging before, running the backend server software myself. Maybe not having to maintain the software/hardware will actually let me think about posting to the blog!

11/14/2007

Creating sparse block devices

I've not found much material online with regards to creating sparse filesystems, and it's a useful thing if you're doing filesystem testing, or just want to pretend that you have more storage than you do to impress your friends (that's fun). The concept of doing this is to make it appear as though you have a larger block device than you actually do. How you accomplish this on Linux is a creative use of device-mapper. First, a little bit of an introduction to device-mapper is in order. Device mapper is a way to concatenate and extend block devices in ways not before possible. It is the basis of how LVM works (using a linear map by default). You can see the tables that LVM has created for you by typing 'dmsetup table'. On my workstation for example, it outputs the following:
[root@rugrat ~]# dmsetup table
bootvg-usr: 0 20971520 linear 8:2 4194688
bootvg-var: 0 4194304 linear 8:2 25166208
bootvg-swap: 0 8388608 linear 8:2 147521920
bootvg-root: 0 2097152 linear 8:2 384
bootvg-data: 0 118161408 linear 8:2 29360512
bootvg-data: 118161408 65536 linear 8:2 155910528
bootvg-tmp: 0 2097152 linear 8:2 2097536
A little explanation of how these tables are formatted. The device mapper tables are expressed in terms of 512-byte sectors. The first number is the starting position within the mapped device, and the second number is the length (of that segment of the device). The third argument is the name of the target (linear in this case). All of the remaining arguments on the line are target specific options. There are various targets that you can use with device-mapper, these are actually kernel modules that get loaded in at runtime. There are several targets available, including: linear - specifies linear regions on disks stripe - specifies striped devices mirror - sets up mirrored devices zero - creates a device of xero's (similar to /dev/zero but as a block device) multipath - used for setting up multiple paths to one device (for example on a Fibre Channel SAN) There are other targets. but that should get you started for now. The linear target has two target-specific options - the device that you're referring to, and the starting sector within the device. Note that while it's not done here, you can actually 'stack' dm devices - this 9is where it comes to be quite powerful. In all of these examples, you see that we're referring to 8:2, which you can decode by looking at the /dev directory and finding the device with major 8, minor 2 (/dev/sda2 in this case). Where this gets interesting is you notice that there are two entries for bootvg-data above. One encompassing sector 0 through 118161408 of bootvg-data, and the other encompassing sectors 118161408, continuing for a length of 65536 sectors. What this indicates is that this volume has been expanded in the past. Also, if the volume were spread out over multiple disks, then there would be multiple tables for it. Now that we've got basic device-mapper theory down, let's get into snapshots. Again, device mapper is the basis for LVM snapshots. A snapshot is a point-in-time copy of a volume. The interesting thing about LVM snapshots as compared to some hardware array vendor's snapshots is that BOTH the source and snapshot volumes are read/write. We'll see how this can happen here. When an LVM snapshot is created, four device-mapper tables are created by LVM. Tehy are, in this order and name (for a volume called base in the volume group vol0, and the snapshot volume is called snap): vol0-base-real: the original mapping of the volume vol0-snap-cow: the copy-on-write device, this is a table that specifies physical storage vol0-snap: the user-visible snapshot volume, dm target is snapshot, with the origin being the 'real' volume above, and the COW device being the COW device above. vol0-base: The user-visible base volume, target is snapshot-origin, with the backing device of the 'real' volume created above. All reads come from the backing device specified for the snapshot, and all writes go to the COW device. Now that we have a firm understanding of the basic facilities of device-mapper, let's explore how to create a sparse block device. The concept here is to combine two types of devices - zero and snapshot. First, we create a huge chunk of just zeros. So let's create a 15TiB GFS filesystem with only 2GB of backing storage (the reason for not using ext3 in this case is due to an architectural advantage of GFS in this case, which is that it does not allocate inode tables at filesystem creation time. Therefore large filesystems don't take hours and more than 2GB to create). First, we need to get the number of 512-byte sectors that 15TiB makes up:
[root@dhcp-144 ~]# echo $[15 * (2**40) /512]
32212254720
Having that, we can now create a 15TiB block device, and take a snapshot of that device, using a real 2GB logical volume that I created earlier as backing for that device. Now that we know the size, we can create the device:
[root@dhcp-144 ~]# echo "0 32212254720 zero" | dmsetup create zero
[root@dhcp-144 ~]# echo "0 32212254720 snapshot /dev/mapper/zero /dev/vg0/backing P 16" | dmsetup create gfs-huge
Note that the above commands took a fraction of a second, and now I have a 15TiB block device that I can write just 2GB of data to. Next, let's create a GFS filesystem on this device:
[root@dhcp-144 ~]# gfs_mkfs -p lock_nolock -j 2 -t cluster2:temp /dev/mapper/gfs-huge 
This will destroy any data on /dev/mapper/gfs-huge.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/mapper/gfs-huge
Blocksize:                 4096
Filesystem Size:           4026174448
Journals:                  2
Resource Groups:           15360
Locking Protocol:          lock_nolock
Lock Table:                cluster2:temp

Syncing...
All Done


[root@dhcp-144 ~]# mount -t gfs /dev/mapper/gfs-huge /mnt
So now the GFS filesystem is created and mounted, let's do a df on it, and see that we really have a 15TiB filesystem:
[root@dhcp-144 ~]# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/gfs-huge   15T  1.5M   15T   1% /mnt
As you can see, the system believes that I've got a 15TiB filesystem mounted up!

Labels:

11/04/2007

Shipyard Pumpkinhead Ale

I'm on a pumpkin beer kick right now, as I am petty much every year at this time. At Standings, my favorite bar in NYC, we are fortunate enough to have two on tap right now (and one that just kicked - the Southampton Pumpkin Ale). The two that we have now are the Shipyard and the Fisherman's Pumpkin Stout.

The Shipyard is overpoweringly spicy, but I like it because of that. There is no doubt that this is a pumpkin beer - and it seems to have a hint of bannana (another one of my favorite things) in it.

The Fisherman's stout is, well, a stout. Obviosly a quite dark beer as opposed to the amber of either the Southampton or the Shipyard. In judging that beer, you have to realize that its a dark beer, and the spice is not designed to be overpowering (but still there).

Since today is Sunday, and Standings is a sports bar (in particular a Boston sports bar and the Patriots/Colts game is on), there are all types here, including those that are drinking Bud Light at a craft beer bar (ugh!), and the ones that expect a high-end beer bar to serve hard liquor. Interesting reason that they can't here (besides the fact that it would really not be conducive to the high-end beer feel of the place)- they are just down the street from a church and there is either a NYC or NY state regulation (not sure which) that you cannot have a liquor license within a certain distance from a church (but the resturant wine licenses that all establishments on this block have are OK). Weird if you ask me - permit them to either have a full license around here or none at all (although that would mean that this fine establishment wouldn't be here)