I’ve spent the last few days migrating Shirt Pocket from Panther Server to Tiger Server, and while it mostly went well, it could have gone smoother. Much smoother.
Hopefully, this post will help others in a similar situation. If you’re not technical, this is going to be oh-so-much gobbledegook: sorry.
The Basic Problem
Unlike upgrading from the non-Server version of Panther and TIger, Server doesn’t have any nice “Upgrade”, “archive and install”, etc. Instead, it’s a lovely clean install every time. And by lovely, I mean not lovely.
Of course, I really couldn’t bring the main server down and do some sort of multi-day transfer, so I set upon a plan. Namely, I’d take the brand new drive I planned to use (a Maxtor MaxLine III 300GB SATA), connect it to a WiebeTech SATA Dock, plug that into a G5 iMac that I usually use for testing, and install Server to it.
Which I did. Bringing up the bare-bones server configuration, adding users, etc was the easy part.
Things I Couldn’t Leave Behind
On my server, I run:
- Apache
- Kerio Mail Server
- FogBUGZ (for bug tracking, customer support, etc)
- vBulletin (our Forums)
- Expression Engine (this blog)
- Mint
Simple enough. But not. Many of the above won’t run out-of-the-box on Panther
or Tiger Server. You need updated versions of PHP, extensions for Apache, updated MySQL—a whole bunch of stuff.
Prepackaged goods
Previously, I’d used Marc Linyage’s installer packages to get most of this running, and while that worked OK, my Apache server was literally crashing every few seconds due to some weird bug. And, there was no real way to update his installs until he did. I was stuck with stuff well past its update-by date.
Darwin Ports to the Rescue
So, instead, I decided to use Darwin Ports to install PHP4 (FogBUGZ doesn’t support PHP5), Apache 2.2, and MySQL 5, all with the options needed to support the above.
For those of you used to GUIs (yes, I know there are various DP GUIs), Darwin Ports would be a bit of a shock. It’s command-line only, and while it does a whole lot of great stuff, it’s a bit, well, clunky. But, I was happy to see that the PHP4 port had what I needed in it.
The Easy(ish) Part
After installing XCode and the developer tools, I installed the latest version of Darwin Ports and got to work. From what I could determine, the command to use was:
sudo port install -v -c php4 +mysql5 +server +apache2 +imap +macosx +darwin_8
That one command downloaded the sources necessary to build MySQL, PHP4, Apache 2, tweaked it to support OSX and Tiger, built the whole thing, and installed it all into the default Darwin Ports folder, /opt/local.
Configuring the Configuration
The first real challenge was getting Apache 2 configured for the Shirt Pocket site. While Apache 2 is mostly backward compatible with Apache 1.3, the httpd.conf syntax is a bit different, especially with regard to extensions. And, Apple’s Admin Console GUI doesn’t do Apache 2. So, I had a few hours’ work tweaking it to support our various realms, virtual servers and sites.
Similarly, I had to hand-configure MySQL 5 and PHP4 to add the various options we needed, including support for Kerio’s sendmail and the like.
An hour or two later, and a few port uninstall and reinstalls later (note that the PHP4 port uninstall/clean leaves bits of the install in its folders, and won’t install properly until you clean those up by hand), I had this ready to go.
Basics Are Go!
After testing things with apachectl and verifying the syntax, I used launchctl (the launchd control CLI), to add the automatically created launchd plists (nice, Darwin Ports!) into launchd and restarted. To my amazement, basic stuff was up and going.
Copying in the Site
So—next, I had to bring the data over from the other server. First, I used scp to copy over the web site files, including the installed copy of vBulletin, Expression Engine and Mint. The site seemed to work fine, although I’d neglected to create our Apache authenticated users. Once fixed, that worked too.
Welcome to The Suck
For FogBUGZ, I did a new install, and here things started to go way, way, downhill.
FogBUGZ is highly dependent on the configuration. In fact, since they recommend the Linyage packages, they’ve hardwired their paths to where he installs things in the /usr/local hierarchy and they also assume that httpd.conf is in /etc. My stuff was in /opt/local, in places FogBUGZ never knew to look, so FogBUGZ wouldn’t even install. Arrgh!
FogBUGZ support was kind enough to provide me with a rough list of prerequisites and checks they were doing, so I spent some time faking it out. After a lot of starts and stops, I determined I had to:
- Symlink /usr/bin/php to /opt/local/bin/php4
- Symlink /usr/local/bin to /opt/local/bin
- Create /usr/local/php
- Symlink /usr/local/php/bin to /opt/local/lib/php4
- Create /usr/local/php/lib
- Symlink php.ini to /opt/local/etc/php.ini
Starting to Run
Once that was done, FogBUGZ was able to install. After install, I had to tweak some configuration files:
- Copy the added fogbugz.conf from /etc/httpd/httpd.conf into /opt/local/apache2/conf/httpd.conf
- Adjust various settings in php.ini
FogBUGZ mostly came up at this point, but additional installs of PEAR components were needed (including one that FogBUGZ didn’t prompt for). But Darwin Ports’ PEAR install is split over two folders, with some components in /opt/local/lib/php4, and some in /opt/local/share/pear. So, I had to change the php.ini to add the additional include_path.
I’m Bugged
Bingo! While that took a number of hours to figure out and tweak, FogBUGZ was now up, running with a totally empty database.
So, next, data migration. Under Panther, all of the above had data in MySQL 4 in various databases and tables. Normally I’d use mysqldump, copy the files across and import them, but Bruce—who’d recently migrated machines himself—suggested using ssh to run it in “one pass”. I decided to dry-run most of the data to make sure everything worked well, while leaving the main system up (which, of course, meant the data would have to be reimported later on).
For example, to migrate the vBulletin forums, I needed to create an empty database on the new machine with the same name. Then, I ran (as root):
ssh g5server /usr/bin/mysqldump --user the-database-user --password=the-password --add-drop-table the-forum-database | /opt/local/bin/mysql5 -u the-database-user -pthe-password the-forum-database
One authentication and a few minutes later, vBulletin’s data was moved over and… vBulletin came right up!
Repeated for Expression Engine: bingo!
And, mint: no problem at all.
So, the technique worked, and the configuration seemed fine: things were up and running pretty well.
TTL - Way Long
Although FogBUGZ could be done the same way, it was going to take a lot longer because its database was absolutely huge (it takes over 14 hours to rebuild). And Mail wasn’t MySQL based, but needed to have its stuff copied across.
FogBUGZ cases come in through mail, so by turning FogBUGZ off and handling all the support through email (which means I can’t refer to earlier cases or do other things that makes this job possible) I could migrate its data. I started that on Tuesday at about 6pm.
26 hours later, it was done, and—happily—it came right up and seemed to work.
At this point, I turned off the forums and blog and pulled them over to the new server, using the same technique as before.
Goodbye Cruel Spam
Next up: Kerio Mail Server. That was going to take a while, but there was no getting around it. I installed a fresh copy of the latest Kerio release, turned it off on the real server, which would cause any inbound mail to get delayed, but would queue on hop out, and used tar w/ssh to get things across:
ssh g5server “( cd /usr/local/kerio/mailserver && tar cf - Store )” | (cd /usr/local/kerio/mailserver && tar xvpf - )
It took a few hours (lots of mail), but all the content came across, and I used scp to copy the other few individual files that needed to migrate over.
Fear of Commitment
The moment of Truth. I turned off apache and took down Shirt Pocket. Then, I quickly re-imported mint (to bring any stats I missed), and powered off the iMac and the real server.
Chatter Chatter
I love the way drives go into the G5. It’s so accessible, and so easy (OK, so drive A is a pain because you have to remove drive B, but still—great stuff), and this was as easy as as pie. Old drive out, new drive in, close up, power on and… nothing.
It just wouldn’t boot.
Oh crap.
Beginning to Panic, and not in the WWPD sense
At this point, the system was headless, so I had no way of seeing what was going on, so I quickly powered off, snagged a monitor and keyboard, and plugged everything in, being careful to smear flopsweat all over each critical connection.
First, I option-booted and… the drive was present, but Open Firmware never released the “Watch” cursor. It just sat there, spinning and taunting. Never moved on, fans started to spin faster and faster since there was no OS and… powered off again.
Tried resetting PRAM and using Open Firmware to reset both PRAM and NVRAM: no go. Removed the B-drive (which wasn’t totally necessary): no.
Clock’s ticking. Server’s down. Customer running away from company that looks like it’s gone out of business.
Mommy!
OK, deep breath, shot of espresso, think. Open Firmware’s probing for devices. What else is connected? AH! There’s a FireWire drive connected. For some reason, it’s decided to lock the bus at this exact moment. Bastard! Powered that puppy off, disconnect, start the boot again and…
and…
comes up! Login window! W00t!
The Woods are Lovely, Dark and Deep
I quickly logged in, and brought up the Server Admin to turn on DNS and make sure everything was OK. And I was greeted by something really, really bad.
Server had decided that my serial number had “already been registered”. It had shut down all services. Nothing was running. My totally legally purchased serial number, my only serial number, my $500 serial number, this isn’t-even-a developer-license-serial-number-I-bought-the-damn-thing-at-the-Apple-Store, was “already registered”. On the iMac. When I did the install to make this easier.
You Bastards!
But that server was not running? I had only installed it to this one drive. That drive was moved from machine to machine. It should be fine!
But it’s not. I’m totally, completely screwed. It’s way after hours, I can’t get the server running, Apple’s not there. I can’t go back to Panther—I’d need to spend another few days re-importing all that data.
They must be playing some Bonjour games to have that serial number float around the network in DNS or something.
Crap. Crapcrapcrap!
Bare to the Bone
There was only one thing I could think to do: whine. Rich Siegel was online, even though it was late, so I popped a message to him so he could laugh at me and my serverless foolishness. (Not that he needs an excuse to laugh at me.)
Which, of course, he did, but then—after wiping tears of joy from his eyes—pulled a rabbit out of a hat. He had a key! I could use that temporarily, until I could get whatever magic incantation I needed to get my real key working!
I typed in the key furiously (pun intended) and… phew. Server up.
It’s alive! Alive! Shirt Pocket Lives!
Cleanup
After I lowered the server down from the opening in the roof, and carefully unwrapped its bandages, things were looking pretty good. Everything was up. Everything was working. The mail was back up, running, and (alas) flooding in.
Boing!
And bouncing. Almost everything. For some reason, mail from the Forums and from FogBUGZ wasn’t going through my relay server. Instead, it was being sent directly from my server, which shouldn’t be happening.
And everything indicated it was going through Postfix. Which I wasn’t using: I had explicitly set things up to use Kerio. But, on examination of the Kerio logs, it wasn’t getting some messages. Postfix was.
You Lie Like Dog
So, I looked again in the Server Admin and… mail was off. SMTP was off. All that stuff was handled by Kerio, or should have been. But wasn’t. For some reason, Postfix was snagging stuff, even though it was supposedly off.
I messed around for a while, but at this point it was 2am, and I was exhausted. It’d been a few days since I started the Migration Adventure, and I wasn’t thinking clearly any more, so… sleep.
Coffee Achiever!
Next morning, caffeinated and slightly more awake. Same problem. But, an idea.
I took a look at the daemons in launchd and, sure enough, org.postfix.master was in there, even though Mail was off in Server Admin. It was set up to run on demand, when anything hit port 25, but—and here’s the weird part—only from localhost. So, when FogBUGZ sent to localhost:25, it triggered Postfix, which was being used instead of Kerio, on the loopback.
Man. OK: so, disable Postfix. Some tweaking later I found the trick to getting it unloaded for good:
launchctl unload -w /System/Library/LaunchDaemons/org.postfix.master
A quick system restart, and victory.
And so, we’re back up. Sorry for any bounced mail, missing site, vanished forums that happened during the process. Hopefully, it won’t happen again for a while.
A few days ago, Shigeru Harada of MacFreak contacted me with a strange problem: he was unable to schedule backups with SuperDuper. Every time he tried, he’d get an error that the script wouldn’t compile.
In the past, we’ve seen this when drives were named with slash characters, but his were quite normal.
After some back-and-forth, I had him take the script template we use, and try to compile it himself, in Script Editor. Shockingly—it failed to compile, and the problem didn’t make any sense.
If you’re familiar with the Japanese keyboard, the backslash key () is replaced by the symbol for the Yen (¥). Way back when, we did a Japanese version of BRIEF, so I was familiar with this phenomenon—paths would be separated by Yen symbols, but everything worked as expected.
But, in AppleScript, it seemed the backslash/yen “swap” completely prevented backslash from doing its normal thing, so this:
set the URL_A_chars to “$+!’,?;&@=#%><{}[]"~`^\|*()”
completely failed to compile, because it looked like this:
set the URL_A_chars to “$+!’,?;&@=#%><{}[]¥"~`^¥¥|*()”
and ¥ didn’t escape as you’d expect.
A huge surprise to me. I did find it discussed in one place (thank you Google and Takaaki Naganoya), and also a reference to bug fixes in Tiger (Backslash characters and Yen sign characters will now compile correctly when the primary language is set to Japanese. [3765766]), but it didn’t seem to be fixed for him—probably because he’s using Panther (which we need to support).
Anyway, workaround in hand, I modified the script to:
set quoteChar to ASCII character 34
set backslashChar to ASCII character 92
set the URL_A_chars to “$+!’,?;&@=#%><{}[]” & quoteChar & “~`^” & backslashChar & “|*()”
Here’s hoping it works… I hope it’s not a problem with chevrons («») too, because I have no idea how I’d work around
that one…
I read Wolf’s I/O error treatise this morning (as well as Alaistair’s response), and thought I’d write a bit about how SuperDuper! actually handles I/O errors, and why. (In fact, this is an expansion and reworking of some email I dropped to Jonathan after reading the article.)
Although Wolf says otherwise, ditto isn’t our underlying engine. We use a variety of APIs in Cocoa and Carbon, augmented with much additional metadata copying. However, when we get a failure with those (such as an I/O error), we retry twice more: once with copyfile and once—just in case—with ditto, verifying after each one.
We do this because we’ve seen the rare case where one API fails but others do not. Weird, I know, but it happens.
If all three retries fail, we stop. This is done for safety: an I/O error could mean the drive is failing, and since you’re dealing with a live backup, it’s important to understand what’s going on. If a significant failure is occurring, steps should be taken to concentrate on recovering your user files, rather than trying to copy the whole drive. We don’t want a user (or SuperDuper!) to continue past the failure: we want them to stop, diagnose and—if necessary—get help. And since most users won’t know what to do (unlike Wolf or Alaistair, clearly), we make it really easy to contact support.
Our User’s Guide has a Troubleshooting section that helps a user determine whether the error is on the source or destination (I don’t explain there how to use the system log and a System Profiler report to locate the source, because it’s pretty obscure stuff—the amount of detail in our log is confusing enough for most), as well as general steps for recovery. But in most situations, 4K of 0s will pretty much be a fatal problem for the file. (I’m shocked, frankly, that Wolf’s Parallels disk was OK given the damage: he was very lucky.)
Most of the time, the problem is actually an iSight camera, iPod, or other bus-powered device misbehaving on the FireWire bus. On occasion, the problem is with the source.
Errors on the source are problematic. As Alastair mentions, modern disk controllers transparently relocate sectors when errors occur. Real problems happen when the drive’s out of spares, or when the on-disk error correction can’t handle the failure. And at this point, the drive has probably been silently failing for a while.
In many cases, SMART status will flag a drive that’s failing badly—SMART Reporter, a nice bit of freeware from Julian Mayer, can give you an obvious warning when this occurs, or even run a program (like SuperDuper!) to do a quick backup of critical files. But, often, it won’t, and experienced guidance and advice is necessary to help people understand what’s going on.
Anyway, as Wolf’s article indicates, and Alaistair agrees, it’s very difficult to continue in a way that ensures data is preserved as much as possible. It’s hard to know what really happened without being there, and an automated fix isn’t guaranteed. So, we’re super conservative. And while it’s obviously labor intensive, we think injecting a human into the process at this point provides the user with the best outcome.
Users often ask me for drive recommendations, and while I provide general suggestions for FireWire drives in the SuperDuper! User’s Guide, I haven’t suggested any Network Accessible Storage (NAS) devices, mostly because I haven’t been entirely happy with the ones I’ve tested.
In general, most NAS units are Linux-based, with a drive that’s formated as ext3, which allow a large sparse image to be stored on the drive. This works well with SuperDuper!, and lets us properly preserve permissions and metadata with full fidelity. Some, like the LaCie “mini” Ethernet drive, are shipped as FAT32, and need to be reformatted as ext3 (or HFS+, or NTFS) before they can be used.
I’ve tested the Buffalo Linkstation and Gigastation, the LaCie mini ethernet drive, the Linksys NLUS2, and a number of others, and while they all work reasonably well when configured properly, none have been recommendation-worthy.
The units I’ve tried in the past have all had very old AFP (Apple’s network protocol) support, old enough to not support files larger then 2GB or so (even though their file system could handle larger), mostly because they’re all based on an old version of Netatalk. SMB can be used instead, and that does work, but you have to know to do it… and most users don’t. This means their backups would fail in various weird ways.
On top of that, these NAS units are pretty insanely slow. Even the Gigastation, using Gigabit Ethernet, did an initial backup running at less than 1MB/s. This improved once the first copy was done, but it’s painful the first time through.
So… nothing to recommend.
Until now.
Over the last week or so, I’ve been doing extensive testing with the Terabyte version of Infrant’s ReadyNAS NV, and I’m very impressed: impressed enough to give it a big thumbs up.
The ReadyNAS linked above is a 4-drive X-RAID unit (which has four 250GB drives set up for fault tolerance and speed, using Seagate’s Nearline SATA drives which—like Maxtor’s MaxLine III drives—are designed to have a longer life and more heat tolerance than regular drives) in an attractive case, with GigE capability. Like its competitors, the ReadyNAS runs Linux, but unlike the others they’re fanatical about keeping things up to date, and support the most recent Netatalk, which properly handles large files.
On top of that, they’ve spent a lot of time optimizing the unit, and it actually delivers very good performance—I got about 5MB/s on a full copy-- far better than the others, even over AFP. (More AFP-specific optimizations are promised for the future, too.)
But the goodness doesn’t stop there. Infrant knows that their OS can do more than just serve up shared storage, and they’ve gone ahead and created a nice port of Slim Devices‘ SlimServer, which runs on the ReadyNAS and can serve music around the house, through real Squeezeboxen or the Softsqueeze player. (In fact, Slim Devices is offering a bundle of a terabyte ReadyNAS NV and two wireless Squeezeboxen for $1499.) Infrant also supports UPnP streaming, etc, so it’s likely to work with the solution you already have.
And since they have good AFP support, you can point iTunes to its media share, and rip directly to it. Nice!
The ReadyNAS is not inexpensive (although the 250GB, easily expanded one isn’t too bad), and you might wonder whether you should just get a Mac mini with a FireWire drive instead. While you certainly could do that, the RAID capabilities built into the ReadyNAS are invaluable when you’re dealing with critical data. Infrant has clearly designed their unit to be rugged and reliable, and that’s obviously important. A failing/failed drive can be hot swapped for a new one, so your data remains safe even in the event of that inevitability.
The only real downside of the Infrant—other than the expense—is that it’s got a fan. Heat is the enemy of hard disks, so it’s important for the fan to be there, and they’ve made it temperature sensitive so it only runs as fast as it needs to. But, it’s not silent (although it’s not that bad), so if that’s an issue, this isn’t the unit for you.
Otherwise, a very enthusiastic recommendation: Infrant has done some really great work here.