Recently in virtualization Category

AMD has released their server-version of the Bulldozer CPU class they released over a month ago, called Interlagos.

Bulldozer/Interlagos is AMD's attempt to grab more of the market from Intel. Currently, it's competing in the value sector but not on performance. The days when AMD CPUs were the virtualization kings have been gone for a couple years now. AMD would like that crown back, thank you, and they're driving to go there.

That said, comparing performance between equivalently clocked AMD and Intel CPUs is hard. They're optimized for different tasks, which means that the smart Systems Engineer looking for the next CPU to base their environment on should pay attention. Workload matters! Those AMD CPUs may be damned cheap compared to Intel, but if you're doing the wrong things with them you'd be better off buying previous-gen Intel chips.

The most controversial thing AMD has done is to make two cores share a Floating Point Unit. They've also done quite a bit of optimization in their Arithmatic Logic Unit, where Integer math is handled. The reasoning behind this is that most server usage these days is integer heavy, highly parallelizeable workloads; most database and simple web-serving workloads are entirely Integer and parallel-friendly, and that's a large part of the webapp stack right there. The likes of Google Plus, StackExchange, and Reddit do far more Integer work than floating-point, so something like Interlagos should be a good fit.

And the early benchmarks show that AMD does indeed have an edge on integer-heavy workloads over equivalent generation Intel parts. Intel still has an edge on compute-performance-per-watt, but AMD holds the edge on compute-performance-per-GHz. Pick which is more important to you.

Specialist workloads like render farms are edge cases, if big consumers, so engineering to handle those workloads is not worth the time. By staking out the middle of the market, AMD can drive innovation in the marketplace by forcing Intel to get creative in the middle. It's good for everyone.



Yes, but what about me, you cry.
Or something. News of the new vSphere 5 pricing guide has leaked out. Kind of like the NetFlix announcement, it has raised a lot of ire on the part of their customers. As would be expected when your preferred vendor announces you'll be paying a lot more.

The key problem has to do with how they're changing the licensing model for vSphere. We knew they'd change it, we just didn't know if they were going to put DRS and HA into a new Enterprise Plus Ultra tier, or do something else. They did something else.

With vSphere 4 the licensing tiers were based on the processor socket, number of cores, and desired features. If you had over 6 cores on that processor, you needed Enterprise Plus to use them all. If you had 6 or fewer, you could go with one of the three cheaper options.

With vSphere 5 the licensing tiers are now based on a combination of processor socket and RAM (as well as features). A 2-core socket counts as much as a 12-core socket in this scheme (yay). Unfortunately, if that dual-socket 12-core server has 256GB of RAM in it, you'll be paying for 6 Enterprise Plus licenses and not the 2 you were paying under vSphere 4. Also? The prices for Enterprise Plus haven't changed, so you just tripled your licensing costs.

vSphere 4's licensing model encouraged cramming as much RAM into a single server as possible. 12-core CPUs and buckets and buckets of RAM. And this happened, since cheaper is always good, and most VM environments are more RAM constrained than CPU constrained. With pricing per socket and not per core, you could maintain efficient RAM-to-Core ratios with licensing efficiency to boot.

vSphere 5's licensing model encourages servers with much fewer cores and a lot less RAM. Keeping a good RAM-to-Core ratio will involve a lot more physical hosts if you wish to maintain licensing efficiency. And you simply won't be able to reach the heights of efficiency you could with vSphere 4.

This is going to be expensive. We'll see if the industry moves as a whole to something else, I'm sure Citrix is salivating at the thought of upgraders upgrading to XenServer and not vSphere, or lumps it and just starts resenting the hell out of VMware the way they already resent (but still use) Oracle.

Is network now faster than disk?

| No Comments
Way back in college, when I was earning my Computer Science degree, the latencies of computer storage were taught like so:

  1. On CPU register
  2. CPU L1/L2 cache (this was before L3 existed)
  3. Main Memory
  4. Disk
  5. Network
This question came up today, so I thought I'd explore it.

The answer is complicated. The advent of Storage Area Networking was made possible because a mass of shared disk is faster, even over a network, than a few local disks. Nearly all of our I/O operations here at WWU are over a fibre-channel fabric, which is disk-over-the-network no matter how you dice it. With iSCSI and FC over Ethernet this domain is getting even busier.

That said, there are some constraints. "Network" in this case is still subject to distance limitations. A storage array 40km from the processing node will still see more storage latencies than the same type of over-the-network I/O 100m away. Our accesses are fast enough these days that the speed-of-light round-trip time for 40km is measurable versus 100m.

A very key difference here is that the 'network' component is handled by the operating system and not application code. For SAN an application requests certain portions of a file, the OS translates that into block requests, which are then translated into storage bus requests; the application doesn't know that the request was served over a network.

For application development the above tiers of storage are generally well represented.

  1. Registers, unless the programming is in assembly, most programmers just trust the compiler and OS to handles these right.
  2. L1/2/3 Cache, as above, although well tuned code can maximize the benefit this storage tier can provide.
  3. Main memory, this is directly handled through code. One might argue that at a low level memory handling constitutes a majority of what code does.
  4. Disk, This is represented by file-access or sometimes file-as-memory API calls. These tend to be discrete calls from main memory.
  5. Network, This is yet another completely separate call structure, which means using it requires explicit programming.
Storage Area Networking is parked in step 4 up there. Network can include things like making NFS connections and then using file-level calls to access data, or actual Layer 7 stuff like passing SQL over the network.

For massively scaled out applications, the network has even crept into step 3 thanks to things like memcached and single-system-image frameworks.

Network is now competitive with disk, though so far the best use-cases let the OS handle the network part instead of the application doing it.
For the most part, I don't deal with software licensing. I like it this way. WWU has one person who has dealing with software licensing as a core part of her job. She does a great job at it! And like a tax-accountant, the fact that this is what she does day in and day out means that I rarely end up applying head to appropriate walls over licensing.

Licensing is a hard, hard problem, especially when you throw in that heady mix of virtualization and discounts into the mix. The software industry as a whole is still figuring out how to license stuff in a virtual environment, and we work in an industry that typically gets discounts above and beyond what others get. Add in a healthy dash of smaller software vendors that have simpler licensing regimes, and you have enough for a full time job.

Our particular place in the software ecosystem is defined by these characteristics:

  • Higher Education
  • State Government (we are a public university that received state support)
  • ~4,200 staff
  • ~14,000 full-time-equivalent students
  • ~23,000 users (+/- 1500 depending on our exact percentage of part-timer students and where we are in the year)
  • ~1,800 computer-lab seats
  • ~3,000 computers
Any given software licensing regime will take one or more of the above. Most will take 'Higher Ed' into account into their licensing, which we appreciate since it makes it cheaper. Being State means we can leverage master-contracts negotiated by the State, but frequently the higher-ed discount makes independent contracts cheaper than going through Olympia.

Then comes the rest of it.

For stuff that everyone gets, like anti-virus software, the per-seat charge is where we pay the closest attention. The last time the AV contract went out for bid, the entity that won did so in large part because they applied their per-seat charge against the staff count (4,200), where the others applied it to the FTE-student + Staff number (18,200); even with a higher per-seat charge that still came in markedly cheaper.

For stuff that goes on every lab-seat, we have a mix of software that has to be licensed for every seat (1800) or a concurrency arrangement that requires a license server. We like the concurrency arrangement, but it means we need a license server.

Which brings me to license servers. We do have a FlexLM license-server, but not everything uses FlexLM. Also, FlexLM version support may end up forcing us to have more than one licensing server (which incurs OS licensing costs). It's working, but a trial.



That's end-user software licensing, but IT-software can be worse. We solved our Microsoft problem by caving and getting a Select agreement. While our client-access-licenses are covered, we still need to purchase Server licenses. Rolling out Exchange 2010 required picking up OS and Exchange server licenses, but we didn't have to worry about umpteen thousand CALs for our end-users. Simple, in its way. However, knowing, or know how to find out, what exact products are covered by our Select agreement is part of why our licensing person does this full time.

I have yet to meet the system administrator of any experience who hasn't had to shake their fist at ala-carte pricing for IT software. We ended up changing our backup vendor over a gross misunderstanding of the differences in licensing models between our old vender and the prospective new one. The new one wasn't any cheaper, the biggest benefit there is that we had to buy new licenses a lot less often than we had to with the old vendor.

IT software pricing competence is the biggest value-add that Value Added Resellers give, at least for us (see also, Higher Ed discounts). We don't do enough volume with IT software for our in-house software person to gain any real experience with it, so we have to trust outsiders. We trust them to know WTF they're talking about when it comes to reselling IT software. Which means that when they fail us, we get very angry. The backup software debacle ended up costing us about 120% more than we expected to pay once we got truly fully licensed for what we wanted to do, and getting to that step took nearly two years before we got it all ironed out. Since then, however, they've caught things we never even knew existed.

Like tax accountants handling very complex returns, getting a definitive answer for what is exactly the amount of licensing needed varies with who you ask. You'd think the rules would be nicely deterministic, but that presupposes complete knowledge of the whole system and uniform understanding of definitions. In the past when attempting to get an idea as to what licensing was required for a specific thing I'm thinking about we've gotten different answers from different people at the vendor itself, as well as a couple of VARs. Four people, four answers, two of whom actually work for who we were trying to buy from, fairly large price spread (though the higher-ed discount percentage was uniform across all of them).

Who do you trust? Whichever one sounds reasonable, and hope. If you get in trouble with a vendor, at least you have someone complicit in your unintended perfidity. In the case of the backup software we learned the hard way that how the software enforces licenses was different from how the VAR understood licensing to work (imperfect understanding of definitions). They did help make things right, but the sheer magnitude of the definition misunderstanding still made it very expensive.



For entities smaller than us, licensing is just as much of a head-ache, especially for end-user software. They may not have a Select agreement so may be purchasing off of the Microsoft/Adobe/Apple rolling cart. Virtual desktops, which we've avoided so far, make that hard to pin down since the definition of 'machine' is variable. I'm not in that space, but I hear things. It's hard.

It's hard everywhere.

I would not be at all surprised if someone could found a business on advising companies on software licensing issues.

A network problem

| 1 Comment
I have a server attempting to talk SMTP to our internal smart-host. But it seems our hardware load-balancer is getting in the way. When sniffing the switch-port the server is on, the  conversation goes like this:

Server -> Mailer [SYN]
Mailer -> Server [SYN, ACK]
Server -> Mailer [Ack]
Mailer -> Server [RST, ACK]
[3 seconds pass]
Mailer -> Server [SYN, ACK]
Server -> Mailer [RST]
[6 seconds pass]
Mailer -> Server [SYN, ACK]
Server -> Mailer [RST]

What's going on here?

Well, the first three packets are the classic TCP 3-step handshake. The Mailer then issues a Acknowledge-Reset packet, which shuts down the conversation. Then things get weird. Three seconds pass, and the mailer retransmits the second packet. The Server, having shut down the TCP conversation normally like it was told to in the 4th packet, just issues a RESET packet telling the sender there is no connection to ACK and to stop trying. This repeats 6 seconds later.

So how did the Mailer forget it had torn down the TCP connection? That is the mystery. I haven't had a chance to get a sniffer on the Mailer side of things yet, so I'm not certain what it's seeing. It could be the load-balancer is throwing a fit, and the follow-on packets at 3 and 6 seconds are from the Mailer server itself somehow.

Strange things.

Desktop virtualization

| 2 Comments
Virtualizing the desktop is something of a rage lately. Last year when we were still wondering how the Stimulus Fairy would bless us, we worked up a few proposals to do just that. Specifically, what would it take to convert all of our labs to a VM-based environment?

The executive summary of our findings: It costs about the same amount of money as the normal regular PC and imaging cycle, but saves some labor compared to the existing environment.

Verdict: No cost savings, so not worth it. Labor savings not sufficient to commit.

Every dollar we saved in hardware in the labs was spent in the VM environment. Replacing $900 PCs with $400 thin clients (not their real prices) looks cheap, but when you're spending $500/seat on ESX licensing/Storage/Servers, it isn't actually cheaper. The price realities may have changed from 12 months ago, but the simple fact remains that the stimulus fairy bequeathed her bounty upon the salary budget to prevent layoffs rather than spending on swank new IT infrastructure.

The labor savings came in the form of a unified hardware environment minimizing the number of 'images' needing to be worked up. This minimized the amount of time spent changing all the images in order to install a new version of SPSS for instance. Or, in our case, integrating the needed changes to cut over from Novell printing to Microsoft printing.

This is fairly standard for us. WWU finds it far easier to commit people resources to a project than financial ones. I've joked in the past that $5 in salary is equivalent to $1 cash outlay when doing cost comparisons. Our time management practices generally don't allow hour by hour level accounting for changed business practices.

Lemonade

| No Comments
Two days ago (but it seems longer) the drive that holds my VM images started vomiting bad sectors. Even more unfortunately, one of the bad sectors took out the MFT clusters on my main Win XP management VM. So far that's the only data-loss, but it's a doozy. I said unto my manager, "Help, for I have no VM drive any more, and am woe." Meanwhile I evacuated what data I could. Being what passes for a Storage Administrator around here, finding the space was dead easy.

Yesterday bossman gave me a 500GB Western Digital drive and I got to work restoring service. This drive has Native Command Queueing, unlike the now-dead 320GB drive. I didn't expect that to make much of a difference, but it has. My Vista VMs (undamaged) run noticibly faster now. "iostat -x" shows await times markedly lower than they were before when running multiple VMs.

NCQ isn't the kind of feature that generally speeds up desktop performance, but in this case it does. Perhaps lots of VM's are a 'server' type load afterall.

Fabric merges

| No Comments
When doing a fabric merge with Brocade gear, when they say that the Zone configuration needs to be exactly the same on both switches, they mean that. The merge process does no parsing, it just compares the zone config. If the metaphorical diff returns anything it doesn't merge. So if one zone has a swapped order of two nodes but is otherwise identical, it'll not merge.

Yes, this is very conservative. And I'm glad for it, since failure here would have brought down our ESX cluster and that's a very wince-worthy collection of highly visible services. But it took a lot of hacking to get the config on the switch I'm trying to merge into the fabric to be exactly right.

SANS Virtualization

| No Comments
Mr. Tom Liston of ISC Diary fame is at the SANS Virtualization Summit right now. He has been tweeting it. I wish I was there, but there is zero chance of me convincing my boss to send me. Even if it was a year in which out of state travel was allowed.

Mostly just interesting quotes so far, but there have been a few interesting ones.

"When your server is a file, network access equals physical access" - Michael Berman, Catbird

From earlier: "You can tell how entrenched virtualization has become when the VM admin has become the popular IT scapegoat" - Gene Kim

On VMsprawl: "The 'deploy all you want, we'll right click and make more' mentality." Herb Goodfellow, Guident.

I expect to see more as the week progresses.

I have VMWare Workstation installed on my workstation. It is very handy. We have an ESX cluster so I could theoretically export VMs I work up on my machine directly to the ESX hosts. I haven't done that yet, but it is possible.

Unfortuantely, I've run into several performance problems since I installed it. The base system as it started, right after I switched from Xen:
  • OpenSUSE 10.2, 64-bit
  • The VM partition is running XFS
  • Intel dual core E6700 processor
  • 4GB RAM
  • 320GB SATA drives
The system as it exists now:
  • OpenSUSE 11.0 64-bit
  • The VM Partition is running XFS, with these fstab settings: logbufs=8,noatime,nodiratime,nobarrier
  • The XFS partition was reformatted with lazy-count=1, and log version 2
  • Intel quad core Q6700
  • 6GB RAM (would have been 8, but I somehow managed to break one of my RAM sockets)
  • 320GB SATA drives, with the 'deadline' scheduler set for the disk with the VM partition on it.
It still doesn't perform that great. I've done enough system monitoring to know that I'm being I/O bound. I hear ext4 is supposed to be better at this than XFS, so I just might go there when openSUSE 11.2 drops. One of these would go a long way to helping fix this problem, but I don't think I'll be able to get the funding to do it.

Other Blogs

My Other Stuff

Monthly Archives