Recently in storage Category

"How do I make my own Dropbox without using Dropbox" is a question we get a lot on ServerFault.

And judging by the Dropbox Alternatives question, the answer is pretty clear.

iFolder.

Yes, that Novell thingy.

I've used the commercial version, but the open-source version does most of what the paid one does. I suspect the end-to-end encryption option is not included, possibly due to licensing concerns. But the whole, "I have this one directory on multiple machines that exists on all of 'em, and files just go to all of them and I don't have to think about it," thing is totally iFolder.

The best part is that it has native clients for both Windows and Mac, so no futzing around with Cygwin or other Gnu compatibility layers.

An older problem

| 1 Comment
I deal with some large file-systems. Because of what we do, we get shipped archives with a lot of data in them. Hundreds of gigs sometimes. These are data provided by clients for processing, which we then do. Processing sometimes doubles, or even triples or more, the file-count in these filesystems depending on what our clients want done with their data.

One 10GB Outlook archive file can contain a huge number of emails. If a client desires these to be turned into .TIFF files for legal processes, that one 10GB .pst file can turn into hundreds of thousands of files, if not millions.

I've had cause to change some permissions at the top of some of these very large filesystems. By large, I mean larger than the big FacShare volume at WWU in terms of file-counts. As this is on a Windows NTFS volume, it has to walk the entire file-system to update permissions changes at the top.

This isn't the exact problem I'm fixing, but it's much like in some companies where granting permissions to specific users is done instead of to groups, and then that one user goes elsewhere and suddenly all the rights are broken and it takes a day and half to get the rights update processed (and heaven help you if it stops half-way for some reason).

Big file-systems take a long time to update rights inheritance. This has been a fact of life on Windows since the NT days. Nothing new here.

But... it doesn't have to be this way. I explain under the cut.
HP has been transitioning away from the cciss Linux kernel-driver for a while now, but there hasn't been much information about what it all means. Just on the name alone the module needed a rename (one possible acronym of cciss: Compaq Command Interface for SCSI-3 Support), and it is a driver that has been in the Linux ecosystem a really long time (at least in the 2.2 kernel era). A lot has changed in the kernel.

HP has finally released a PDF describing the whole cciss vs. hpsa thing.

Read it here: http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02677069/c02677069.pdf

The key differences:
  • HPSA is a SCSI driver, not a block-driver like CCISS
  • This means that the devices are moving from /dev/cciss to /dev
  • Device mode numbers will change
  • New controllers will increment kernel names, so a second controller will be /dev/sda, not /dev/sdb, so use udev names (partition ID, disk-ID, that kind of thing) to avoid pain.
  • For newer kernels (2.6.36+) cciss and hpsa can load at the same time if the system contains hardware that needs those drivers.

Is network now faster than disk?

| No Comments
Way back in college, when I was earning my Computer Science degree, the latencies of computer storage were taught like so:

  1. On CPU register
  2. CPU L1/L2 cache (this was before L3 existed)
  3. Main Memory
  4. Disk
  5. Network
This question came up today, so I thought I'd explore it.

The answer is complicated. The advent of Storage Area Networking was made possible because a mass of shared disk is faster, even over a network, than a few local disks. Nearly all of our I/O operations here at WWU are over a fibre-channel fabric, which is disk-over-the-network no matter how you dice it. With iSCSI and FC over Ethernet this domain is getting even busier.

That said, there are some constraints. "Network" in this case is still subject to distance limitations. A storage array 40km from the processing node will still see more storage latencies than the same type of over-the-network I/O 100m away. Our accesses are fast enough these days that the speed-of-light round-trip time for 40km is measurable versus 100m.

A very key difference here is that the 'network' component is handled by the operating system and not application code. For SAN an application requests certain portions of a file, the OS translates that into block requests, which are then translated into storage bus requests; the application doesn't know that the request was served over a network.

For application development the above tiers of storage are generally well represented.

  1. Registers, unless the programming is in assembly, most programmers just trust the compiler and OS to handles these right.
  2. L1/2/3 Cache, as above, although well tuned code can maximize the benefit this storage tier can provide.
  3. Main memory, this is directly handled through code. One might argue that at a low level memory handling constitutes a majority of what code does.
  4. Disk, This is represented by file-access or sometimes file-as-memory API calls. These tend to be discrete calls from main memory.
  5. Network, This is yet another completely separate call structure, which means using it requires explicit programming.
Storage Area Networking is parked in step 4 up there. Network can include things like making NFS connections and then using file-level calls to access data, or actual Layer 7 stuff like passing SQL over the network.

For massively scaled out applications, the network has even crept into step 3 thanks to things like memcached and single-system-image frameworks.

Network is now competitive with disk, though so far the best use-cases let the OS handle the network part instead of the application doing it.

Rogue file-servers

| No Comments
Being the person who manages our centralized file-server, I also have to deal with storage requests. The requests get directed to a layer or two higher than me, but I'm the one who has to make it so, or add new when the time comes. People never have enough storage, and when they ask for more sticker-shock means they often decide they can't have it.

It's a bad situation. End-users have a hard time realizing that the $0.07/GB hard-drive they can get from NewEgg has no bearing on what storage costs for us. My cheap-ass storage tier is about $1.50/GB, and that's not including backup infrastructure costs. So when we present a bill that's much more than they're expecting, the temptation to buy one of those 3.5TB 7.2K RPM SATA drives from NewEgg and slap it in a PC-turned-fileserver is high.

Fortunately(?) due to the decentralized nature of the University environment, what usually happens is that users go to their college IT department and ask for storage there. For individual colleges that have their own IT people, this works for them. I know of major storage concentrations that I have absolutely nothing to do with in the Libraries and the College of Science and Technology, and a smaller but still significant amount in Huxley. CST may have as much storage under management as I do, but I can't tell from here.

Which is to say, we generally don't have to worry about this problem. That problem? That's what happens when you have a central storage system that can't meet demand, and no recourse for end-users to fix it some other way.

And I'd hate to be the sysadmin who has to come down on that person like a ton of bricks. I'd do it, I won't like it, because I also hate not meeting my user's needs that flagrantly, but I'd still do it. Having users do that kind of end-run leads to pain everywhere in time.

Changing student storage habits

| No Comments
I had to do some maintenance on my script that gathers disk-space usage, so the stats database has been on my mind lately. It's been a while since I posted any graphs. This particular graph is a unified chart of the student home-directory volumes over time. I merged the NetWare and Windows volumes into a single space-used chart.

stu-vols-2011.png
This is a very noisy chart.The discontinuities are mostly student-account-purge events that happen once a quarter, but the fall purge is by far the largest.

Note the downward tail at the end! The same chart for staff is a pretty smooth line straight up at a pretty steady slope. This? Clearly usage-habits are changing. I don't know if this is reflected by habitual USB-drive use or if they're using the cloud in some way to store their files, but clearly student-driven storage demand (at least for home-directories) is falling.

One area where it is clearly increasing is the Blackboard Content volume.
bbcontent-2011.png
This data is noisy in that we purge old courses, but we've also changed how many quarters of courses we keep in the system. Looking at this growth chart, it's pretty clear to me that the downtick in student home-directory and class-volume consumption is made up for in increased Blackboard usage. Each quarter more and more professors sign on, other professors increase their usage, and the average size of the files being passed into the system increases.

LIO-Target on OpenSUSE 11.3

| 4 Comments
I mentioned I was playing with it, but now I have it working. So I'm sharing! Yay!

LIO-Target is one of several iSCSI modules available for Linux. As of the 2.6.38 kernel it'll be baked in. They even have a handy feature-comparison chart to explain why. For those with Microsoft or ESX environments, LIO-Target supports SCSI-3 persistent reservation, which is needed for clustering in both environments. It is nifty.

Disclaimer: There are some steps in this guide that I'm not going to give command-by-command guides to. If you don't know how to do that step you shouldn't be doing this at all. I know its unfriendly, but not being able to do that means you don't really know what you're doing and this kind of thing really isn't for you.

Anyway, this is how it works for 11.3, probably 11.2, and maybe not 11.4. 11.4 is still baking at the time of this post, and I'm reasonably certain that the LIO-Target stuff won't be mainline in time for feature freeze, but hey, won't know until it ships. It'll almost definitely be there for the next OpenSUSE version, be it 11.5 or 12.0.

Until such time as it becomes baked into OpenSUSE getting LIO-Target into it will require compiling custom kernel modules and hand editing certain key config files. Apparently there are some advanced UI tools available from Rising Tide systems, called 'rtsadmin', but I have not evaluated them.

In case you don't give a fig for this, I'm putting the guide under the fold.

Getting dirty with iscsi

| 5 Comments
I'm working on a low-cost storage solution again. This is the same thing I was working on earlier this year, but the budget demons  have eaten the proposal that would have required this thing to be replicated on another array, so I can actually move on it. Since my last round of software evals was some months ago, I'm taking another look at things. And really, it's different.

The criteria I'm dealing with right now:

  1. Most not cost anything more.
  2. It would be really, really nice to support SCSI 3 Persistent Reservation, as systems that require that are where most of my storage demand is these days.
  3. Since the Windows iSCSI initiator doesn't auto-reconnect when the connection fails, unlike linux, the iSCSI target software must not require a service restart to make config changes.
This limits things.

Also, if point number 3 above can be configured away some how, I haven't found it yet. Though I'd be happy (really happy) if I were wrong. Do let me know if you know differently.

OpenFiler, my previous best-bet, uses the Linux IET iSCSI system. Which unfortunately requires a restart to work. Therefore, I can't use it. The alternative is to shim in the newer LIO Target system onto OpenFiler, but if I'm going to do that I may as well use something with a newer kernel (like OpenSUSE) to get at the newer packages.

LIO-Target has taken me quite some time to crowbar onto OpenSUSE 11.3, but I finally found the right pry points. It states on the box that it does SCSI 3 PR, and I've just proven that it can make config changes without requiring a restart. JOY.

As it happens, LIO-Target will be replacing the current kernel-iscsi system as of 2.6.38. This also means that it is a highly moving target.

Unfortunately, the need for a crowbar means that if I decide to go production with this, the effort needed to, shall we say, keep things current will be all on me. Right now it's requiring a module recompile after every kernel update, which makes it a significant support burden. Also, UI doesn't exist yet, I'll have to create the management scripts from scratch.

One alternative is to wait until OpenSUSE 11.4, which should have a newer kernel. Unfortunately, at this point it looks like that'll be 2.6.37. So if I want to use 2.6.38, I'll have to do the kernel-dance m'self. Grar.

I should probably factor the time I spend dealing with this thing into our cost-per-GB.

Clear error messages

| No Comments
It isn't often this happens, but every so often I run into a really clear error message. All too often it's stuff like "0x80042000" that I then have to google to figure out. So imagine my delight when I ran into this baby the other day:

P800-Dead.png

Hard to be more clear than, "dlu=0DEAD:DEADh".

Fixing it turned out to be remarkably easy. Just pop the battery card off of the controller and reinsert. That seems to have cleared whatever it was that caused this. And off I go. In this case, this server had been powered off for close to 3 months and I'm guessing the RAID battery had simply fully discharged.

File-system obsessions

| 1 Comment
If there is one thing that separates Windows sysadmins from Linux sysadmins it is worry about file-systems. In Windows, there is only one file-system, NTFS, and it updates whenever Microsoft releases a new Server OS. The main concerns are going to be about correct block-size selection for the expected workload and in recent versions, how and whether to use ShadowCopy.

The Linux side is a lot more complex. Because of this complexity sysadmins pick favorites out of the pack through a combination of empirical research and just plain gut-level "I like it". The days when ext2 and ext3 were your only choices are long, long gone. Now the decision of how to format your storage for specific data loads has to take into account the various file-system features, strengths, weakness, and quirks, and then how to optimize the mkfs settings for a best fit. A file-system that will contain a few directories with tens to hundreds of thousands of files in them is probably not the same filesystem you'd pick to handle 100's of TB worth of 8-20GB files.

Whenever a new file-system hits the Linux kernel there is a lot of debate over its correct usage and whether or not it'll replace older file-systems. btrfs and ext4 have gotten the most debate recently, with ext4 finally in 'stable' while btrfs remains 'experimental'. ZFS continues to be something everyone wants but can't have, the Solaris admins get to be smug and the BSDians ignore all the fuss and just use it, while the btrfs devs ask for patience while they finish making a file-system that does everything XFS can do.

What this means is that you sometimes see Linux admins succumbing to New Shiny! syndrome and installing an experimental file-system on a production system. I saw this frequently with ext4 while it was still staging up. When used with LVM, the new file-systems have a lot of very interesting features that aren't possible with ye olde ext3+LVM.

  • Break previous directory-size limits
  • Use extents instead of block-allocation, which reduces fragmentation and speeds up fscks
  • Tracking of free block groups makes fscks go even faster
  • Checksumming journal writes for consistency
  • Checksumming entire file-system structures for improved consistency checking
  • Better timestamp granularity, for when milliseconds are too large for your application
  • In-filesystem snapshots, above and beyond the snapshots LVM allows
  • Improved allocated file tracking makes handling large directories much more efficient
What's not to love? It's for this reason that you get questions like this one on ServerFault wondering if Windows has a file-system that can do log-checksumming, where the asker has a New Shiny! feature they really like and clearly hasn't realized that there is only one file-system on Windows and it's what Microsoft gives you. Apple does exactly the same thing with their XServer gear, and so did Novell until they ported NetWare's key features over to Linux (NSS is very good, but some workloads are best suited to something else).

This can be a real challenge for Windows admins attempting to get into the Linux space, since "filesystem choice" is not something they've had to worry about since FAT stopped being a viable option. The same goes for Linux admins getting into Windows administration, the lack of choice is seen as yet another sign that Windows is fundamentally inferior to Linux. The differing mindsets are something I see in the office a few times a year.

Other Blogs

My Other Stuff

Monthly Archives