SuperDuper!
I/O Error Recovery Monday, June 19, 2006
I read Wolf’s I/O error treatise this morning (as well as Alaistair’s response), and thought I’d write a bit about how SuperDuper! actually handles I/O errors, and why. (In fact, this is an expansion and reworking of some email I dropped to Jonathan after reading the article.)
Although Wolf says otherwise, ditto isn’t our underlying engine. We use a variety of APIs in Cocoa and Carbon, augmented with much additional metadata copying. However, when we get a failure with those (such as an I/O error), we retry twice more: once with copyfile and once—just in case—with ditto, verifying after each one.
We do this because we’ve seen the rare case where one API fails but others do not. Weird, I know, but it happens.
If all three retries fail, we stop. This is done for safety: an I/O error could mean the drive is failing, and since you’re dealing with a live backup, it’s important to understand what’s going on. If a significant failure is occurring, steps should be taken to concentrate on recovering your user files, rather than trying to copy the whole drive. We don’t want a user (or SuperDuper!) to continue past the failure: we want them to stop, diagnose and—if necessary—get help. And since most users won’t know what to do (unlike Wolf or Alaistair, clearly), we make it really easy to contact support.
Our User’s Guide has a Troubleshooting section that helps a user determine whether the error is on the source or destination (I don’t explain there how to use the system log and a System Profiler report to locate the source, because it’s pretty obscure stuff—the amount of detail in our log is confusing enough for most), as well as general steps for recovery. But in most situations, 4K of 0s will pretty much be a fatal problem for the file. (I’m shocked, frankly, that Wolf’s Parallels disk was OK given the damage: he was very lucky.)
Most of the time, the problem is actually an iSight camera, iPod, or other bus-powered device misbehaving on the FireWire bus. On occasion, the problem is with the source.
Errors on the source are problematic. As Alastair mentions, modern disk controllers transparently relocate sectors when errors occur. Real problems happen when the drive’s out of spares, or when the on-disk error correction can’t handle the failure. And at this point, the drive has probably been silently failing for a while.
In many cases, SMART status will flag a drive that’s failing badly—SMART Reporter, a nice bit of freeware from Julian Mayer, can give you an obvious warning when this occurs, or even run a program (like SuperDuper!) to do a quick backup of critical files. But, often, it won’t, and experienced guidance and advice is necessary to help people understand what’s going on.
Anyway, as Wolf’s article indicates, and Alaistair agrees, it’s very difficult to continue in a way that ensures data is preserved as much as possible. It’s hard to know what really happened without being there, and an automated fix isn’t guaranteed. So, we’re super conservative. And while it’s obviously labor intensive, we think injecting a human into the process at this point provides the user with the best outcome.
Users often ask me for drive recommendations, and while I provide general suggestions for FireWire drives in the SuperDuper! User’s Guide, I haven’t suggested any Network Accessible Storage (NAS) devices, mostly because I haven’t been entirely happy with the ones I’ve tested.
In general, most NAS units are Linux-based, with a drive that’s formated as ext3, which allow a large sparse image to be stored on the drive. This works well with SuperDuper!, and lets us properly preserve permissions and metadata with full fidelity. Some, like the LaCie “mini” Ethernet drive, are shipped as FAT32, and need to be reformatted as ext3 (or HFS+, or NTFS) before they can be used.
I’ve tested the Buffalo Linkstation and Gigastation, the LaCie mini ethernet drive, the Linksys NLUS2, and a number of others, and while they all work reasonably well when configured properly, none have been recommendation-worthy.
The units I’ve tried in the past have all had very old AFP (Apple’s network protocol) support, old enough to not support files larger then 2GB or so (even though their file system could handle larger), mostly because they’re all based on an old version of Netatalk. SMB can be used instead, and that does work, but you have to know to do it… and most users don’t. This means their backups would fail in various weird ways.
On top of that, these NAS units are pretty insanely slow. Even the Gigastation, using Gigabit Ethernet, did an initial backup running at less than 1MB/s. This improved once the first copy was done, but it’s painful the first time through.
So… nothing to recommend.
Until now.
Over the last week or so, I’ve been doing extensive testing with the Terabyte version of Infrant’s ReadyNAS NV, and I’m very impressed: impressed enough to give it a big thumbs up.
The ReadyNAS linked above is a 4-drive X-RAID unit (which has four 250GB drives set up for fault tolerance and speed, using Seagate’s Nearline SATA drives which—like Maxtor’s MaxLine III drives—are designed to have a longer life and more heat tolerance than regular drives) in an attractive case, with GigE capability. Like its competitors, the ReadyNAS runs Linux, but unlike the others they’re fanatical about keeping things up to date, and support the most recent Netatalk, which properly handles large files.
On top of that, they’ve spent a lot of time optimizing the unit, and it actually delivers very good performance—I got about 5MB/s on a full copy-- far better than the others, even over AFP. (More AFP-specific optimizations are promised for the future, too.)
But the goodness doesn’t stop there. Infrant knows that their OS can do more than just serve up shared storage, and they’ve gone ahead and created a nice port of Slim Devices‘ SlimServer, which runs on the ReadyNAS and can serve music around the house, through real Squeezeboxen or the Softsqueeze player. (In fact, Slim Devices is offering a bundle of a terabyte ReadyNAS NV and two wireless Squeezeboxen for $1499.) Infrant also supports UPnP streaming, etc, so it’s likely to work with the solution you already have.
And since they have good AFP support, you can point iTunes to its media share, and rip directly to it. Nice!
The ReadyNAS is not inexpensive (although the 250GB, easily expanded one isn’t too bad), and you might wonder whether you should just get a Mac mini with a FireWire drive instead. While you certainly could do that, the RAID capabilities built into the ReadyNAS are invaluable when you’re dealing with critical data. Infrant has clearly designed their unit to be rugged and reliable, and that’s obviously important. A failing/failed drive can be hot swapped for a new one, so your data remains safe even in the event of that inevitability.
The only real downside of the Infrant—other than the expense—is that it’s got a fan. Heat is the enemy of hard disks, so it’s important for the fan to be there, and they’ve made it temperature sensitive so it only runs as fast as it needs to. But, it’s not silent (although it’s not that bad), so if that’s an issue, this isn’t the unit for you.
Otherwise, a very enthusiastic recommendation: Infrant has done some really great work here.
Rowr! Sunday, June 11, 2006
Next week, we’ll be releasing v2.1.2 of SuperDuper with various improvements, including Growl support. I think you’re all going to like it.
We’d been investigating various methods of notifying users of successful and failed copies for some time, especially for users who are backing up headless or remote systems, and decided to leverage Growl’s well-tested and mature functionality rather than roll our own. It’s a nice package, and I’m glad it’s out there.
There’s a ton of flexibility built into Growl: you can show status visually with various types of floating panels and pop-ups, mail to any account, and even forward notifications to other machines on your network.
Putting this into SuperDuper! initially seemed pretty easy, but we ran into some unusual problems with our AppleScript-based “schedule driver” that I thought might prove interesting.
Obviously, we couldn’t require Growl—a 3rd party application—to be present on a user’s system. The idea here is to use Growl if present, not mandate it. But, our schedule driver—which can be extended by the user—is compiled “on-the-fly” when a schedule is created or modified.
But, while AppleScript has various techniques for dealing with a missing application or dictionary when running a compiled script, it really, really, really wants it to be present during compilation. So, a statement like
tell application “GrowlHelperApp”
notify with name “Scheduled Copy Succeeded” title “SuperDuper! Copy Succeeded” description “Copy was successful.” application name “SuperDuper!”
end tell
won’t work if GrowlHelperApp isn’t present on the User’s System. Drat.
AppleScript does, however, have something called “raw event syntax”. Basically, you pre-compile your statement, hand-compiling it into Apple Events. Thus:
tell application “GrowlHelperApp”
«event notifygr» given «class name»:"Scheduled Copy Succeeded”, «class titl»:"SuperDuper! Copy Succeeded”, «class desc»:"Copy was successful.”, «class appl»:"SuperDuper!"
end tell
But, this won’t work either: even though there’s nothing in the tell block that needs the dictionary, the tell itself will try to reference GrowlHelperApp, and fail. And if GrowlHelperApp is there, your AppleEvents will be magically transformed into AppleScript when you compile!
So, there’s no choice here—you have to have a tell block, or the events don’t go to the right application. But you can’t “tell” an application and have the compilation succeed if the user doesn’t have Growl installed! Or can you…
Fortunately, a little evil goes a long way in AppleScript. By using a string variable, rather than a string literal, we can prevent the compiler from loading the dictionary at compile time—and since we’re using raw event syntax, there’s no error. Thus:
set growlAppName to “GrowlHelperApp”
tell application growlAppName
«event notifygr» given «class name»:"Scheduled Copy Succeeded”, «class titl»:"SuperDuper! Copy Succeeded”, «class desc»:"Copy was successful.”, «class appl»:"SuperDuper!”
end tell
Bingo! This actually allowed compilation… and worked fine when tested in a simple script. But, in the real one, it didn’t work. Instead, I got the following runtime error:
Finder got an error: application “GrowlHelperApp” doesn’t understand the «event notifygr» message.
Now, of course, it does understand the message, because it worked in the simple case!
It took a while for me to figure it out, but the difference, was that in the real script, the Growl notification was in a nested tell block. And, even though the “nearest tell” is for GrowlHelperApp, for some reason this was seen as Finder’s GrowlHelperApp, rather than mine. (You’d think they’d be the same, but not so much.)
The solution, after much gnashing of teeth, was to explicitly reference the script’s main execution context, with:
set growlAppName to “GrowlHelperApp”
tell my application growlAppName
«event notifygr» given «class name»:"Scheduled Copy Succeeded”, «class titl»:"SuperDuper! Copy Succeeded”, «class desc»:"Copy was successful.”, «class appl»:"SuperDuper!”
end tell
I’m telling you, AppleScript has the unique ability to make me feel like the stupidest developer on the face of the earth. Kudos to you, AppleScript!
Anyway, expect this early this week…
Luminous Love Tuesday, May 09, 2006
It looks like Michael Barrish of Luminous has something important to say:
I love SuperDuper and I don’t care who knows.
Awww. So cute!
(Thanks for the public display of affection, Michael!)
Enclosure Reliability Tuesday, May 09, 2006
James Wiebe, of WiebeTech, has written a terrific white paper that exhaustively analyzes the reliability of external enclosures, detailing the factors that contribute to failures.
As we’ve known for some time here at Shirt Pocket, large capacity external drives that use multiple physical drives in a single enclosure to boost their performance and capacity are much less reliable than single drive units.
As far as I’m concerned, anyone who relies upon hard drives for storage, backup or otherwise, should read and understand this paper.
Thanks for writing it, James!
Hoarse-ing around Thursday, May 04, 2006
Despite being down and mostly out with a cold/flu, I’ll be appearing on The Tech Night Owl LIVE tonight, May 4th, talking about SuperDuper!, netTunes, launchTunes, and desperately trying to not cough up a chunk of lung.
You can tune into the broadcast from 6:00 to 8:00 PM Pacific, 9:00 to 11:00 PM Eastern, at http://www.techbroadcasting.com. An archive of the show will be available for downloading and listening within four hours after the original broadcast.
You can also access the show’s Podcast feed, available at: http://www.techbroadcasting.com/nightowl.xml.
Macworld UK Nomination! Monday, April 24, 2006
Looks like SuperDuper! 1.5.5 was nominated for a 2006 Editor’s Choice award by Macworld UK in the “Storage” category. Cool, and congratulations to the other nominees, too!
Exhaustive/exhausting Sunday, April 23, 2006
Maurits, of plasticsfuture, has posted an exhaustive, two part review of backup programs on Mac OS X. The first part focuses on the general issue of backing up files on OSX, and the second contains an extensive analysis of more backup programs than I’ve seen covered in one place—freeware, shareware and commercial.
In my blog, I tend to talk about the usability aspects of SuperDuper!, because that’s where I focus my efforts. I can only do that because Bruce and I have very high standards for the rest of SuperDuper!, and Bruce’s copy engine is second to none. In fact, in this review of 16 tools, SuperDuper! was the only tool that worked correctly:
The surprising conclusion is that almost all Macintosh backup or cloning programs do not fulfil (sic) their primary purpose, i.e., they are not able to restore files with all associated metadata. This is despite the fact that many of the tools are advertised as “safe”, “accurate”, “bug-free”, etc. The tools that fail are harmful because they generate a false sense of security. Even more exasperating is that many of these tools cost (significant amounts of) money. The only laudable exception is the great SuperDuper application, which performs flawlessly. (Emphasis mine.)
Many thanks to the pseudonymous Maurits for putting this whole thing together: it couldn’t have been easy to do.
I just wanted to give a bit of a Heads-Up to Teh Internets Users who might be relying on the Restore tab of the DVD that came with their Intel Mac.
Unfortunately, the copy of Disk Utility that’s on those DVDs (I’ve checked the one for the MacBook Pro and the iMac, but I don’t have an Intel Mac mini to test with) has a non-functioning Restore tab: the tab relies on drag-and-drop to set the destination volume (if you’re not using an image, the source as well), and—due to what looks to be a bug in this specific version of the utility—drag and drop does not work in the volume sidebar.
This means it’s not possible to restore a volume when booted from this DVD (regardless of how that volume was created). If you’re relying on the DVD’s restore functionality, I suggest installing a minimal system to a small partition on an external drive instead, as the Disk Utility that’s part of Tiger itself works just fine.