Shirt Pocket Watch

One of the interesting things that happens when you do a “live” backup—that is, back up the volume that you’re booted from—is copy files that are currently open.

Normally, this isn’t a huge problem. But, on occasion, things go horribly w0rng. This is a story of one of those things that happens every so often with SuperDuper, and what we’ve done about it. It’s a mite technical, but hopefully interesting.

As you probably know, under OS X files have an owner, a group, and a “mode”. These three things work together to determine who can read, write, execute and do other things to the files on your system.

Less well known are the “flags” that are associated with the files as well. For example, the “immutable” flag—uchg—can be set to prevent a file from being modified, regardless of its “mode”. There are a number of flags that can be associated with files and SuperDuper needs to properly replicate them when copying.

So far so good.

Every so often, though, we get reports of users who have files on their destination volumes that can’t be removed during Smart Update. The error always looks like this:

We found that we needed to remove a file on the destination. But when we tried to clear its immutable flag, we couldn’t. The only solution was to use Erase, then copy which—while it worked—was inconvenient.

We were recently finally able to reproduce the issue in house, and what we found was quite curious.

When a file is being written at the same time we’re backing it up, it’s possible for that file to look “weird” in the file system. When the above happens, it seems that the directory entry isn’t completely updated, and—in this unusual state—some flags are set that result in the above behavior. Here’s the above file’s directory entry, listed with “ls -lo” (which lists flags too):

---s-wS--- 1 nobody 3221220928 sappnd,arch,schg 524237 Dec 31 1969 sort9qlI6c

So, weird stuff. The group is complete garbage, and the sappnd and schg flags are set. And—worse—since the system returned that junk to us, we faithfully replicated it on the copy.

Why worse, you might ask? Because while you can set the sappnd and schg flags, you cannot reset them while booted normally. And, schg is the “system immutable” flag: it prevents the file from being deleted. Once you have this flag set, that file’s going nowhere.

The reason for this is that FreeBSD—the Unix that OS X is built on top of—has a concept of “security levels”. By default, our Macs boot up and run in “Secure level 1”, which prevents anyone from writing in certain folders like /dev/mem and /dev/kmem, even with escalated privileges (such as “root” access). And, in secure level 1, schg and sappnd flags cannot be removed.

And how can you get to securelevel 0 or -1, where they can? You have to boot into single user mode.

(You can learn some more about the various BSD security levels at this page.)

Yikes! What that means is once you’ve set these flags, you’ve basically got to go super-geek to unset them.

It’s clear we can’t ask our users to reboot into single user mode to fix this kind of thing, so we decided that the best thing is to detect the “garbage” case as best we could, and—in that situation—mask out the schg and sappnd flags when copying. The data we’re given by the OS is still wrong, but at least the copy won’t suffer for it, and the user won’t have to erase their backup next time around.

This fix, as well as some other stuff I’ll be talking about over the next few days (including native Intel support), will be coming in v2.1 of SuperDuper—keep an eye out for it!

says:
20 Feb 2006 at 09:59 pm | #

If the file is being written while the backup is being performed, would it be possible for SuperDuper! to essentially flag that file for follow up internally? What I mean is if SuperDuper could detect that a file is currently being written, it could remember which file it is and move on to the rest of the back up, then at the very end return to that file and try and copy it again. If SD runs in to the same problem again, it could then use your solution. It seems to me that this would provide for a more reliable backup and reduce the risk of getting a corrupted file on your backup. As someone studying computer science, I’d be very interested to hear your response. Thanks!

Dave Nanian says:
20 Feb 2006 at 10:23 pm | #

Yes—it could be flagged and added to a queue for post-processing. But there’s no guarantee that it’ll be “correct” the next time around, and—more importantly—adding a second pass significantly complicates the algorithms used, with the inherent disadvantages that result. (For example, locked upper level folders cause lower level files and folders to be immutable, and should those be files you have to retry, you need to backtrack, unlock, re-lock… you get the idea.)

It’s really hard—especially when you’re trying to detect bizarre failures heuristically—to not overcomplicate a solution. And since we don’t know why the file is strange on the source, nor when that situation is going to turn around, it’s safer—and far easier to test (critical for this kind of application)—a less complex approach. It’s more easily verified, and (should the solution prove insufficient) easily extended should that be needed.

Paul Mitchell says:
22 Feb 2006 at 07:06 am | #

Damn right it is a hard problem. It happens with live backup of heavily-used databases - uncommitted transactions - so mind-blowingly destructive that database backup is a special case, bizarre failures being deadly.

My opinion, for what it’s worth, is that a user-level file backup program may reasonably note and log open and locked files rather than attempt to process them, with an optional retry at the end for good measure. Missing files, especially locked ones, aren’t so critical to users, who tend to be present during the backup anyway and can fix such niggles on-the-fly.

On the other hand, automated filesystem-level backup must use a method akin to database backup, such as transaction log journals, snapshots and the “shadow volumes” on NT, because a guaranteed complete and consistent copy of the entire filesystem is a requirement for system-level disaster recovery. In this case, the recovery of those locked files become crucial, because they usually support some vital system function.

Apples and oranges, I suppose. It’s always hard to tell the difference. They’re both fruit.

Dave Nanian says:
22 Feb 2006 at 07:33 am | #

Indeed, which is why we have the before/after scripts in our Advanced settings—these are specifically designed to allow server users to checkpoint their databases so they’re not backed up “live"… a little mysqldump goes a long way toward alleviating this classic issue!

This whole area is an interesting one, and while we’ve never intended to address the “server” market as such (although we are on occasion used that way), we’re always trying to come up with better methods of handling edge cases…

says:
22 Feb 2006 at 06:10 pm | #

I had no idea that backup was this complex. Now I feel that my registered copy of SuperDuper! is more valuable than ever. Thanks for the work. As I’ve written before, this program is most enjoyable for the great sense of mellowness that comes over me after using it. I’ve got a backup!!!

Dave Nanian says:
22 Feb 2006 at 06:26 pm | #

You bet: there’s a lot going on “behind the scenes”.

The trick is making it look easy!

says:
05 Mar 2006 at 01:52 am | #

This is happening to me right now after I tried to trash a movie file before SD! copied it, b/c I forgot to ignore it in my copy script. Turns out I trashed it right when SD! was copying it. Now I’ve got a 1.44GB “alias” on my external FW clone. Turned off Open Firmware pw, booting into single-user…

says:
24 Mar 2006 at 11:25 am | #

I still get the error “␀␀␀␀HFS+ Private Data: Operation not permitted” on a couple of folders. my only solution is to erase then do a complete back-up. I wasn’t able to delete the files (inside a folder) using “sudo rm -rf”, I get “not enough privilidges” and fixing permissions is no help. i dont know what else to do.

Dave Nanian says:
24 Mar 2006 at 12:02 pm | #

The weird thing is that you’re seeing this file at all, Bert. HFS+ Private Data is a special folder that belongs at the top of a drive. It shouldn’t appear at all—have you done unusual things to this drive, folder, etc—manual copies, access from OS9, or the like?

Here’s a link to a bit of information about it: detailed information is in Technote TN1150…

says:
09 Oct 2006 at 04:37 pm | #

Hey Dave,

Thanks for the great software and excellent philosphy as well.

I have a client who’s experiencing this exact problem. Here’s a copy of her log file:

| 05:22:58 PM | Info | Warning: error clearing immutable flags: 77600160 for: /Volumes/iMac Backup/Users/stevenshari/Library/Caches/Safari/12/11/0837533639-3423022007.cache
| 05:22:58 PM | Info | Error removing item: /Volumes/iMac Backup/Users/stevenshari/Library/Caches/Safari/12/11/0837533639-3423022007.cache of type: 100000, Operation not permitted
| 05:22:58 PM | Error | SDCopy: Error deleting /Volumes/iMac Backup/Users/stevenshari/Library/Caches/Safari/12/11/0837533639-3423022007.cache

I’m not currently able to physically able to admim/troubleshoot the machine at the moment but, if I could just delete the cache file in mention would it solve the problem next time around ?

also is this preventing the rest of the data to be properly backed up ?

Thanks,

Michael

Dave Nanian says:
09 Oct 2006 at 04:49 pm | #

Sounds like the backup needs to be erased and redone; make sure the user is up to date with SD!, too!

michael York says:
09 Oct 2006 at 09:50 pm | #

Thanks for the quick reply… worked like a charm!

SuperDuper [indeed] !

~Michael

Rick says:
18 Jun 2007 at 09:56 am | #

Great article! Our link in one of the above comments has changed for the SSIs the site’s using.
http://rixstep.com/2/20040621,00.shtml

Also we’ve noticed that HFS+ Private Data is now a rose by any other name. In fact there are three hidden items in root and what Apple are doing is giving them zero inodes to keep them under the radar.

Cheers.

Dave Nanian says:
19 Jun 2007 at 03:24 pm | #

Thanks, Rick. Yeah, we still see \\HFS+ Private Data show up in some weird places on drives, I think because a whole “drive” was copied with Finder with Cmd+A then pasting in OS9… the file was hidden by giving it an “offscreen” location, but you can’t fool Cmd+a—nasty!

Rick says:
19 Jun 2007 at 05:42 pm | #

Cool. They’re getting cheekier too. In the good old days you could see a discrepancy between stat.st_nlink for root and the number of entries returned by getdirentries. Today you can’t anymore.

Today they use another trick: they set the inode to 0 (zero). A zero inode is evidently a ‘marker’ for the file system when it’s on its way to removing a directory entry altogether. But I guess as these zero inode files are not queued for deletion nothing happens to them. So Apple change the name of their private data file and hide the two journal files there the same way.

And I wish bigod I knew how to get at those suckers!

Members

Search

Monthly Archives

Categories

Recent Entries

Syndicate