Archive for the ‘work’ Category
Sitting here in class (Crisis Management, so far a fun class!), I was struck by an observation that you, general public, may find useful.
Every competent information technology professional I’ve ever met has uttered the phrase, “So what happens when (foo) gets hit by a truck?” If your IT people don’t ever ask you that question, you may want to look into hiring some new IT people.
Phil had a post that led to Jessica and her site made the blogroll. I like this one quite a bit:
Engineering Yourself Out Of A Job
This is, in fact, the right way to do my job. Make it easily reproducible, make it easily understandable. In systems administration, about 90% of what you do (after you’ve done the setup correctly) can be maintained by someone who makes half of what you make. Of course, really competent systems administrators are really, really rare… and as such the setup you start out with at a new job is hardly ever done correctly. Fixing broken setups requires a high skill level. Making migrations from one working setup to another requires pretty high skill, and in the IT business this is something you pretty much have to do once every 3-5 years.
The problem I see is the “C”. Too often I see people who are actually good at my job come into an organization, fix it (pouring in long hours and enough frustration to cause early heart failure), and six months after everything is working smoothly circumstances lead to their departure and entropy starts to set it.
Systems administration is a thankless job. My friend Erich once called systems administration “the plumber of the 21st century”… I used to think he was right, but I now realize that one of the fundamental differences between the plumber and the sysadmin is that the plumber comes to the rescue when you are experiencing misfortune, and the sysadmin is the guy who is putting in rules you don’t like, telling you that you can’t use your iPhone with the corporate Exchange server, and is obviously at fault when something breaks. Oh, and often when you go looking for him he’s surfing the web! If your plumbing breaks, you’re grateful to the plumber when he fixes it. If your IT infrastructure breaks, you’re angry with your sysadmin until he fixes it.
From Megan and Ann, apparently today’s theme is workspaces… and since I’m busy manually installing three different machines that don’t fit under my normal support umbrella, there’s lots of time to blog between hitting “next”.
Disclaimer – every year, I end the summer season by cleaning out my office in preparation for the start of the fall term, which begins at the very end of September/beginning of October at Caltech. Once term starts, everything goes to hell until, well… about now. So what you are about to see is *not* indicative of my normal office surroundings.
I do tend to be more disorganized than Ann, but it’s not usually anywhere near this bad. Of course, that just means that there’s lots of stuff to look at in the pictures…
Here is my office door, personalized according to geek systems administrator minimal requirements:
Here Be Dragons
The comics are a smattering of PhD, Order of the Stick, Wondermark, XKCD, and Dilbert (see the links page). That’s the Onion article title “Study Reveals Pittsburg Unprepared For Full-Scale Zombie Attack” in the lower right. My favorite Onion article, the Gillette Five Blades, is alas NSFW.
Here’s what my office looks like from the doorway:
You’ll get closeups following. Here’s my desk:
Command And Control
Pictured: a professor’s new laptop (a 1420, currently being back-ported to XP, see the previous post), my kinesis keyboard, a wireless logitech mouse, and 22″ monitor hooked up to a RHEL box, a logitech web cam that only works under Windows, and my Fujitsu laptop running XP Tablet PC 2005. The desktop (visible on the lower left) has a copy of When WIll Jesus Bring The Pork Chops sitting on top of Overcoming The Five Disfunctions of a Team. Carlin’s book is subpar relative to his earlier stuff, but the Disfunctions book is stellar. That’s a NiCad battery charger right about where my right foot would be if I was sitting at my desk in this picture.
Next up, just to the left of the desk, is a couple of file cabinets covered with stuff:
yeah, this needs to be reorganized
Scattered around in this photo: a USB DVD+/-RW drive, a jar of computer screws and jumpers, a pencil jar, spare hard drives, a box of crayons I keep forgetting to take home for Jack, a pile of Communications of the ACM magazines, a spindle of DVD-Rs, a pile of installation CDs that need to be put away, cleaning wipes for LCD flatpanels, a dozen gigs of RAM for various hardware platforms, and a copy of Tom Clancy’s Every Man A Tiger – the story of Chuck Horner, Air Force commander during Desert Storm. For people critical of the current Iraq war, I recommend you read this book.
The whiteboard, just cleaned this morning coincidentally, with various to-do lists:
Just to the left of that, a machine setup/diagnosis minidesk:
Pictured here: a rat’s nest of cables necessary to connect a workstation to that monitor and one of the two keyboards on the desk (PS/2 and USB), a lab machine that’s currently being updated (see the to do list), my labeler, some canned air, and the tail end of my Giants pennant that I’ve had since I was about six.
Then a floor bookshelf:
archives (part I)
Here we have various photos of my wife and family, a pile of IS-related schoolbooks on top of a broken laptop, more RAM, a hard drive that needs to be shipped back to Seagate, another laptop that needs a basic install, and all my old RPG rulebooks and Dragon magazines that I don’t have space to store at home. At Caltech, they represent a badge of nerdiness that helps me to interact with the customer base. Really.
Above that bookshelf, the wall-mounted one:
yes, I have a lot of books
Bottom shelf, left to right: Windex and SImple Green (just out of frame) a jar of computer bits and pens, my undergraduate mathematics textbooks, a bunch of Linux and Windows related books, and on the far right a pile of ITIL framework books. Middle shelf, left to right: delicate task wipes (just out of frame) – great for cleaning your glasses, btw – a collection of books about LaTeX, my old Windows 2000 Active Directory books from Microsoft Press, and about 80 meters of Cat 5e cable (with a bag of RJ-45 ends). Top shelf, left to right: bins with various bits (these are actually labeled correctly and usefully), a Sega Genesis in the box (yes, it works), more delicate task wipes, and a Planters Peanuts jar full of cable ties.
Next, the corner before you get back to the door:
Here we’ve got a network protocol map (just for color), my minifridge (stocked with Fresca at the moment), a box that has a Nintento Game Cube and a bunch of additional game console crud in it, the computer bits cabinet filled with nice organized trays of spare parts, three dead rackmount servers, a tool kit bag (barely visible at the bottom there), my old 22″ monitor in a box going back to the factory for an RMA, and a CaseLogic binder filled full of driver CDs and recovery disks that I like to keep handy.
Finally, the last bit before you get back to the door:
archive (part III)
Here you can see the incredibly noisy fan I have to run all day in order to keep this office tolerable with multiple machines running, a dead computer that needs to go to e-waste (also on the to-do list), my Saitek joystick that I used to use when I played Battlefield 1942 six years ago, more spare hard drives, a pile of video cards, and hidden in the middle on the floor a quarter sized copy of the architectural drawings for the new IST building.
Needless to say, cleaning all this up is a huge project…
Mr. Murphy moved into my office last week. He’s caused me a considerable amount of pain and annoyance in the last 7 days.
I have 20 lab machines that are a particular Dell model, known to have a hardware problem. Last year, I replaced four of them. On Monday, last week, 15 of the remaining 16 failed simultaneously. That’s just the beginning.
Today, the monitor on my desk decided that no, it wasn’t going to display 1680×1050 anymore. Period. Nope, not gonna do it! 1440×900 on this monitor makes me want to tear out my eyes. Viewsonic is requiring me to dance the Red Tape Dance to get a replacement shipped out. 10 days is the best estimate. If I have to look at this for 10 days, I’m going to be in crazy land, not to mention have one shattering long-term headache.
Everything I’ve touched in the last 8 days has been a prerequisite hell problem… in order to fix [foo], I first have to fix [bar], and before I fix [bar], I have to fix [foobar], and before I fix [foobar], I have to first find an invoice, and a receipt, and talk on the phone to someone who has a checklist written for a complete technical neophyte that requires each box to get a little black check mark before we can move on to the next line, and there’s 40 of them, and it takes an hour to convince the other person that I’ve already tried all this, thank you, I’ve been doing this job for 15 years and for chrissake just SEND ME THE PART, I CAN’T FIX [bar] or [foo] until I fix [foobar] and I can’t do that until the part GETS HERE.
This happens a lot when you work in IT (this is why #23). Usually you only have to deal with problems like this every once in a while, though. I have the Murphy’s Touch this week.
This would not be a good time for friends or relatives to call me and ask me for advice on fixing their computer. If I answer the phone, the probability is high your machine is going to outright explode. I’m just saying.
(Note: all of this is bunk. There is no magic rule that says these things happen. There’s just a huge desire for there to be a rule, because frustration this great is doubled when there is no cause.)
I bought a tablet PC (a Fujitsu) almost a year ago. I’ve mentioned it before, but I’ve been meaning to blog about it a bit more thoroughly and just haven’t gotten around to it.
Switching over to a tablet is foundationally a major change in how you use your computer. Normally, when I buy a new machine, I spend a considerable amount of time getting it tweaked *just the way I like it*. Flip this dial, turn that switch, install this widget, etc. I didn’t do that with this computer. Why? Because I wanted to use it for a while to find out how it was different, so that I could at some point in the future blow it away and reinstall it clean to *just the way I like it*. I knew that when it came to the tablet, *just the way I like it* was something that was going to be different from non-tablet computing, and I wanted to play with it for a while to find out what those differences were. More on that in my next post.
Well, I played with it for a year. I learned a lot of things about my interface with the computer. I installed a lot of software (some of which I’ll install again, some of which I decided was horrible). I hooked it up to a number of different peripherals, installed drivers, uninstalled drivers, messed with the registry, etc. I’ve hacked this thing pretty hard in the last 12 months.
I’ve killed it, finally. This was expected, so it’s no big deal. But today I plugged it into my docking station here at work and it’s decided that it can’t recognize my external display’s native resolution (I’ll post about that too, someone else has had this problem). The difference between 1600 x 1200 and 1650 x 1280 doesn’t seem like a lot, but looking at any display in a non-native resolution is like listening to a symphony with the strings section muted 50%… it drives me nuts. I reapplied the fix that made this problem go away 8 months ago, no dice. One of the other devices I’ve installed (the webcam, the wireless mouse, the printer, some native fujitsu driver, whatever) is futzing something up. That’s more or less normal for a Windows box that’s about a year old, anyway.
So, I have to take off an nuke the entire site from orbit. It’s long overdue. It’s going to drive me to make a couple of changes in how I use my computer on a daily basis, instead of doing things halfway between how I used to do them and how I do them now. I’ll finally be using 80% of the tablet’s functionality. I’ll actually post a bit about the machine, in hopes that any gentle readers might learn something interesting.
Now if I can just find the installation disk…
I’ve been working in the IT industry in one way or another since I graduated college in 1993. That’s 15 years now… wow, seems like it hasn’t been that long.
I’ve been involved with many different IT projects in many different organizations, and I’ve seen or heard or been exposed to a thousand more. I’ve seen successes and I’ve seen failures. Overall, more failures than successes. This shouldn’t be a surprise to anybody, the industry storybook is rife with tales of colossal failures… maybe 5 failures for every success.
Here’s why IT projects fail. I’m going to tell you all, so that you’ll know (if you’re a sysadmin or a programmer or whatever) how to avoid them, or you’ll know (as a non-IT person), how to recognize when your IT department is starting something that is very very likely to cost a bucket of money and return very little, except to give you fodder to rake them over the coals when you’re at the water cooler with someone else from Accounting.
There is no such thing as a technological solution. There is no problem that you can solve with technology. Stop thinking that you can, because when your thinking starts at that point, you’ve already started building a foundation without checking to see whether or not the ground can support any weight.
When you’re an IT worker, people bring you problems all the time. Sometimes, they’re not really “problems” at all -> there’s a bug in some software, or something is mis-configured, or some other thing that may take you minutes or hours to fix. This is really the equivalent of putting a band-aid on a wound. The real goal is to prevent infection until the wound heals. Eventually, the software will be replaced with a new version, or the main router will come back online, or whatever… and the work that you’re doing now will be essentially wasted time. Important time, granted… customer-service enabling time -> you’re saving them time at the expense of your own.
With these sorts of problems, you’re a mechanic. You’re a plumber. You’re finding out what doesn’t work in technological system and patching it or working around it. This is the grunt work, the scut work, the stuff that keeps us employed on a daily basis. You’re not providing a solution to a problem. You’re hacking. This isn’t a bad thing, it needs to be done. But this is firefighting. Optimally, you want to do as little of this as possible, because you’re at heart very lazy, and you know your customers want everything to “just work”.
Real problems start deeper. “I need a way to let people see my time schedule” is a problem which requires a solution. “My administrative assistant can’t sync my Treo to the corporate Exchange server” isn’t a problem that requires a solution -> it’s a bug that needs a hack. When people bring you bugs, hack. When people bring you problems, you need to build a solution.
This always, always, always needs to start with information gathering. Period. Always. If you’ve worked in four organizations before, and you’ve run Exchange, and someone comes to you with “I need a way to let people see my time schedule”, odds are very very good you’re going to blurt out, “Well, I could set up an Exchange server…”
Don’t. Cease. Back up. You’re doing it wrong. Period.
You’ve made the first mistake, you started building a house… and you don’t even know that what the customer wants is a house.
Sometimes, a someone comes to you with, “I want you to set up an Exchange server…” and you’re going to blurt out, “Okay, I’ve done that before, it’s pretty easy…”
Don’t. Cease. Back up. You’re doing it wrong. Period.
You’ve made the second mistake, you started building a Victorian because someone told you they think they need a house. The customer doesn’t know what they need. They know what they *want*. It’s your job to figure out if what they *want* is actually what they *need*. Moreover, it’s your job to know if what they need is possible. Sometimes, it’s not.
If you tell them that it *is* possible because your boss is scary and shouts and says, “Don’t tell me what’s impossible,” when you argue with him, I’m begging you… get into another line of work. Eventually you’re going to get fired, or you’re going to get fed up and quit, and the next poor bastard who comes in is going to spend months of aggravation trying to fix the piece of junk you built because you didn’t have the gumption to tell someone that they ought not to build a skyscraper on top of a bog.
The only thing you can do with technology is operationalize a solution. Information Technology work is *enabling* work. We take solutions and we build stuff to make them happen… but the solution has to already be known to some degree. You have to design a process before you start building an object. If you don’t, you’re going to build a really pretty object that nobody uses. You need to know what it is, not necessarily in minute detail… but you’d damn well better have a good idea that it’s supposed to be a house, if it’s supposed to be a house. Whether or not it’s a Victorian or a ranch or a McMansion is important, but it’s not as important as starting off in a residential zoning area.
You need to keep your eye, always, on the solution… and NOT on the technology. If the technology doesn’t fit the solution perfectly… well, that’s not always bad, and that’s not entirely unexpected. You can’t redefine success by changing the game to “I successfully deployed this technology” because deploying the technology isn’t what the customer wants, they want the problem solved. Define what subset of the problem the technology is fixing, and make sure your customer is satisfied with that subset before you build the thing.
And if they want you to build a Victorian and you’re in a commercial zone, suck it up and tell them “No.”
A follow-up on an earlier post about building machine rooms.
One of the additional difficulties in designing high-capacity server space is the problem of heat transfer. I’m not a mechanical engineer (and really, most people who read this blog aren’t interested in learning a couple college semesters worth of thermodynamics and practical HVAC engineering concepts, not to mention industry standards). In a nutshell, if you’re trying to cool down something that’s really hot, you have limited options.
There are a few products out there that attempt to help you solve this problem. Emerson/Liebert produces a contained server enclosure called the 😄, in two configurations (25kW and 17kW). Rittal Corporation produces a modular refrigerator/server cabinet system called the LCP+ that can be configured in a variety of ways. Of course APC has its own solution called InfraStruXure that also handles cooling in an integrated fashion.
None of these solutions is low cost, on the face of it. On the other hand, buying a pair of XDs (or a row of LCP+ units with the Rittal cabinet enclosures) is much more practical for the purposes of chilling a few racks (or even a smallish datacenter) than trying to retrofit an existing building. Using one of these solutions makes it pretty easy on your facilities manager -> bolt the sucker to the structure of the building, hook up a 3/4″ chilled water pipe and return, and run a big power circuit into the room. This (as expensive as it may be) is probably still going to be significantly cheaper than trying to build out a small datacenter in a converted closet in the back corner of your leased office space. And, you can take it with you when your lease is up, which you probably won’t bother to do if you’ve stuck a 35 ton refrigeration unit in a retrofitted room.
We’re building a new building at my place of employment, and I’ve been working on fitting 25 racks with a design parameter of 25kW per rack into a space that’s about 1,000 square feet. We’ve looked pretty exhaustively at both the Rittal and the Emerson packages, and all other things being equal, here’s my considered opinion.
If you’re building a data center from scratch, both of these solutions are pretty damn good, and they both have minor advantages and disadvantages in their design. In my opinion (again, no engineer here), the Emerson product is slightly better engineered not to fail as a stand alone unit, but the Rittal product is designed to fail more modularly and gracefully. What this means for your organization is dependent upon your reliability requirements. Both of them enable you to fit high-power compute clusters in a very small space. The Emerson product is less flexible than the Rittal product since it is self contained, but the obvious flip side to that is that the Emerson product is much easier to add-on in a smaller increment; if you’re planning on adding compute capacity on a 1-rack every 6 months basis, it’s easier to buy one Emerson 25kW unit every six months than to buy 1 row of Rittal cabinets with LCP+ units every year and a half. If four years pass, and you need to upgrade your coolant capacity because you just swapped out 42 1U dual core 2.6GHz machines with 42 1U ten-core 4.8 GHz machines, you can buy one or two more LCP+ units and tack them onto your enclosure. You can’t really do that with the Emerson solution.
They both work in particular scenarios, depending upon your maximum power load available, your chilled water supply, and the number of machines you want to power up (and how often you want to replace them, or add more). I have a slight preference for the Rittal units because the heat exchanger is on the side instead of the bottom. With the Emerson 😄 solution, your 42U rack is elevated about 14″, which means racking stuff up at the top of the rack requires a lift or a platform; or an employer willing to violate OSHA regulations and an employee strong enough to lift a 100 lb server over his or her head. On the other hand, if square footage is your constraint, you can fit more XDs in the same space that you’d have to dedicate to the LCP solution. Both sets of sales guys were excellent, friendly, and fairly responsive when it came to getting me information.
If you’re blog-searching looking for more information about these sorts of solutions, drop me a comment.