Shirt Pocket Watch

Warning: this is a technical post, put here in the hopes that it'll help someone else someday.

We've had a problem over the years that our Backup on Connect LaunchAgent produces a ton of logging after a drive is attached and a copy is running. The logging looks something like:

9/2/18 8:00:11.182 AM com.apple.xpc.launchd[1]: (sdbackuponmount) Service only ran for 0 seconds. Pushing respawn out by 60 seconds.

Back when we originally noticed the problem, over 5 years ago, we "fixed" it by adjusting ThrottleInterval to 0 (found experimentally at the time). It had no negative effects, but the problem came back later and I never could understand why...certainly, it didn't make sense based on the man page, which says:

ThrottleInterval <integer>

This key lets one override the default throttling policy imposed on jobs by launchd. The value is in seconds, and by default, jobs will not be spawned more than once every 10 seconds. The principle behind this is that jobs should linger around just in case they are needed again in the near future. This not only reduces the latency of responses, but it encourages developers to amortize the cost of program invocation.

So. That implies that the jobs won't be spawned more often than every n seconds. OK, not a problem! Our agent processes the mounts changes quickly, launches the backups if needed and quits. That seemed sensible--get in, do your thing quickly, and get out. We didn't respawn the jobs, and processed all of the potential intervening mounts and unmounts that might happen in a 10-second "throttled" respawn.

It should have been fine... but wasn't.

The only thing I could come up with was that there must be a weird bug in WatchPaths where under some conditions, it would trigger on writes to child folders, even though it was documented not to. I couldn't figure out how to get around it, so we just put up with the logging.

But that wasn't the problem. The problem is what the man page isn't saying, but is implied in the last part: "jobs should linger around just in case they are needed again" is the key.

Basically, the job must run for at least as long as the ThrottleInterval is set to (default = 10 seconds). If it doesn't run for that long, it respawns the job, adjusted by a certain amount of time, even when the condition isn't triggered again.

So, in our case, we'd do our thing quickly and quit. But we didn't run for the minimum amount of time, and that caused the logging. launchd would then respawn us. We wouldn't have anything to do, so we'd quit quickly again, repeating the cycle.

Setting ThrottleInterval to 0 worked, when that was allowed, because we'd run for more than 0 seconds, so we wouldn't respawn. But when they started disallowing it ("you're not that important")...boom.

Once I figured out what the deal was, it was an easy enough fix. The new agent runs for the full, default, 10-second ThrottleInterval. Rather than quitting immediately after processing the mounts, it sleeps for a second and processes them again. It continues doing this until it's been running for 10 seconds, then quits.

With that change, the logging has stopped, and a long mystery has been solved.

This'll be in the next beta. Yay!

Members

Search

Monthly Archives

Categories

Recent Entries

Syndicate