So we chose our DJ and photographer. We haven't actually signed the contract yet but the checks are written. Next up are the videographer and the florist.
I've never really written about what I actually do at work, partly because I'm not really sure what I'm allowed to talk about, partly because I'm not sure I'd explain it well enough for people not familiar with SANs to understand, and partly because I'm not sure that even if people understood, they'd actually care. So I've decided that the last two concerns cancel out the first, which was the largest. So this is what I do for a living. (Sorry for not doing an lj-cut, but I really hate those.)
Brocade, despite all the claims by management that we're a software company, is actually a hardware company. We make switches for storage area networks (SANs). Almost all of our products are Fibre-Channel (FC) based (FC, like SCSI, is both a protocol and a transport layer; also, FC-the-transport is primarily used for transferring SCSI-the-protocol, which is wrapped in FC-the-protocol). Also we're starting to come out with FCIP (FC over IP) and iSCSI products. Our old switches were either 8 or 16 ports, with VxWorks running on an i960. Now all of the new hardware runs a PPC, either a 405 or a 440 variant, with Linux (derived from Monta Vista Linux) as the base OS. All of the new hardware, that is, except the hardware product we acquired when we bought Rhapsody; their box has NetBSD running on a PPC 7xx series (I don't know too many details about it). The Rhapsody box is 16 ports; the Brocade hardware is either a pizza-box style in 8, 16, or 32 ports; or you can get a bladed system, with 16 ports per blade, and up to 8 blades (plus two control blades).
So the way it works is you buy one or more of our switches and hook them together; this becomes what's called a "fabric" (hence the name Brocade). Then you buy a bunch of FC host bus adapters (HBAs, the FC equivalent of an ethernet NIC) and put them into your servers and storage devices (many storage devices have an FC adapter already). Then you connect all the HBAs and devices to the switches, and they're able to talk to each other. The hosts (called the initiator) finds out from the switch (specifically the name server) what other devices are in the fabric and logs into them (these become the targets).
Our largest customers buy dozens of hundred-port switches and put thousands of devices (hosts and targets) on a single fabric. In our default setting, every device can see every other device, which makes for millions of interactions. To be honest, even if our switches were powerful enough to handle all of that, I'm not sure there's enough bandwidth to make it at all usable. So there's this feature we sell call zoning. Each switch has the same copy of the zone database. The zone database has various configurations; each configuration has a set of zones; each zone has the list of devices that can talk to each other. You can, of course, have the same device in multiple zones; so if you have devices A, B, and C, and you want B to be able to talk to both A and C but don't want A and C to talk to each other, you can do that easily.
I used to work on zoning. Zoning really isn't that complicated conceptually but for some reason it always ended up being much harder in practice. I think most of the reason it was so difficult is because of the two requirements we place on zoning: first, that all switches in the fabric always have the same zoning database; second, that you can modify that database from any switch in the fabric. So when new switches join the fabric, they have to merge their zone databases; when you change the zone database, you have to propagate it to all the remote switches. Once you have the zone database locally, you have to do the hardware-based enforcement, and give it to the name server so it doesn't tell devices about other devices it's not allowed to talk to.
Anyway, that's what I worked on from the time I joined Brocade in late June 2002 through about late June/early July 2004. I had worked on it long enough and become an expert in certain areas, so I still get questions about it. But I haven't actually worked on that code in probably four or five months.
Then in July 2004, I joined the Router group. The router is designed to let two fabrics share devices without joining the fabrics. See, merging fabrics is an extremely disruptive process, and sometimes isn't even possible. The router makes it possible, by keeping the fabrics isolated and creating phantom devices in each fabric. All you have to do is modify the zone database on each fabric, so that both fabrics say which devices are supposed to be shared.
The router is currently based on the Rhapsody hardware, mainly because until recently the Brocade hardware didn't have the ability to do the network address translation required for routing. But with our most recently released ASIC, we are able to do enough of the translation in order to make routing possible. So my current job is to help port the router from the Rhapsody NetBSD system (called XPath) to the Brocade Linux system (called Fabric OS, or FOS).
Around late November most of the original functionality was working; it wasn't in a saleable condition but as a proof-of-concept it worked rather well. But the biggest part of porting to FOS is that FOS has High Availability (HA). This means that on a bladed system where there are two control blades, one control blade acts as a backup in case the active fails for any reason (most typically a core dump). Also it means that you can upgrade the firmware on a pizza-box (and a bladed system) without disrupting the device traffic. So for the last month and a half, I've been working on adding the HA component to the router. On Friday I finally finished all of the necessary infrastructure, and my coworkers finished going through the code adding all the necessary hooks. So in theory it's nearly pre-alpha quality.
The actual product that we're working on, to get all of the features that we want, requires some additional hardware, which is being provided by an FPGA. There are other hardware requirements from other groups, so the actual product will only be available on a new blade (and possibly later on a pizza-box), which isn't finished being designed and produced yet. The platform group will get the hardware sometime this month or maybe February, and we're supposed to be able to test the router on the new blade early this year. Hopefully the OEMs will get it in early 2006.
So that's my primary job, is to get the router up and running on FOS by 2006. But then there are also other small projects that I keep getting involved in. The latest one is the build infrastructure for FOS. See, originally we only had a VxWorks-based product, and all of the build tools were only available for SunOS. So all of the engineers had a Sun Sparc running SunOS, and a Windows system (typically a laptop). Then we created a Linux PPC product, and didn't want to get new development hardware. Some of the engineers, including myself, are big Linux advocates and have been working on getting their own private Linux boxes (I turned my Windows laptop into a Linux laptop and somehow lost my Sun) to be a working development platform. That finally works, and I've been doing all of my development on Linux for the last year and a half or so.
Anyway, sort of through that I got involved in a lot of the build infrastructure, and started working on speeding it up. As FOS has grown, the build has gotten slower and slower; on dedicated hardware, just for the kernels (for each of the 4xx PPCs) and daemons, it takes more than two hours. For each developer individually, it can take more than four hours, and sometimes upwards of eight. So now that x86 Linux is a viable development platform, we're getting a bladed system from Dell, and are working to move all the engineers to that. Also, we use ClearCase dynamic views, which places a lot of strain on the network; so we're moving to snapshot views so that all of the files are on local disk. We're hoping to get the build for the average engineer down to 15 minutes. Plus we're hoping to fix all of the Makefiles so that incremental builds work, so a complete build isn't necessary as often as it currently is.
So that's what I do: formerly zoning, currently routing, sometimes build.