I’ve been using crashplan for the last 10 months – the combination of low price, family pack of 10 computers and Windows & Linux support means it beat out BackBlaze for my online backup of choice. However, I’ve run into multiple problems with it, all in one day.
CrashPlan was seemingly stalling when trying to backup my latest set of pictures – It’d be stuck at “Analysing 2012-07-29 > _MG_6076.xmp” for a long time, before (seemingly) moving onto the next .xmp file. This was ridiculous, so I looked into why it was doing that, and found a symptom:
0. CrashPlan was starting & stopping for no rhyme or reason.
I went looking for what could cause it to start & stop with such regularly, and found the first problem:
1. Their software is memory hungry. Ridiculously so.
By default it’s setup to use a maximum of 256MB of RAM. This is a hard limit imposed on the Java VM when it runs. I have it running on my media server, which has been specced with 1GB of RAM. It’d regularly hit the max, but it didn’t seem to have problems, so I chalked it up to the use of Java and left it at that. I’m not the only guy who’s noticed this: One guy has it hitting 1.5GB of RAM.
However, the hard limit led me to another problem:
2. The CrashPlan program crashes and burns
I’ve been getting weekly reports on my backups since I installed it. I quite enjoyed this because my media server is headless, so set-it-up-and-forget-it backups were awesome. I’d periodically go in via VNC and check up, and the desktop interface reported everything was just fine.
Except it wasn’t. It seems that beyond a certain number of files, Java hits the hard memory limit and dies:
[07.31.12 08:13:52.777 ERROR QPub-BackupMgr backup42.service.backup.BackupController] OutOfMemoryError occurred...RESTARTING! message=OutOfMemoryError in BackupQueue!
Because the file set changes very irregularly, once it starts crashing, it’ll continue to crash until you intervene and manually raise the memory limit. I raised the memory limit to 1.5GB, and Java’s only using 932MB, so there’s some headroom for growth if necessary.
In the configuration directory, I found a bunch of restart.log files while hunting for the file which defined the memory limits. Upwards of 260k of them . (I’m not kidding, I started a new SSH session to kill ls the first time I tried to list the directory because it was Taking. So. Long. I actually thought ls crashed.)
Each and every file seems to have been created when the CrashPlan Engine restarts. So that means CrashPlan ran out of memory and restarted at least 260 thousand times without me knowing.
Which leads me to the third problem:
3. Backups have at least one edge case where they’ll fail silently.
This is a screenshot of the most recent CrashPlan report that I got sent:
First is a laptop that hasn’t been connected, so I’ll ignore that. But helium is my media server, and that’s running just fine, right?
Nope. Backups have been failing since March 16, based on the first restart.log file I had.
I had an easy way to check – the main thing that I was backing up was my pictures, and those are sorted automatically into folders based on the date they were taken. So I pulled up the web browser to look at the restorable files after restarting CrashPlan.
And promptly had a mini-freak out. There was a gap between 2012-03-13 and 2012-07-29. And not just because I hadn’t been taking photos that much.
Which means one thing: backups weren’t succeeding, but I was told everything was OK.
This was Not. Good. And I was Not. Impressed.
My “Last backup” times apparently meant “Last connected”. Which means:
4. The Backup status report is misleading
Ironically, the backup status report sample that CrashPlan has in their docs (dated June 15th 2009) fixes this problem.
I imagine people saw the two times and were confused, so CrashPlan merged the two times. In my case, that’s an oops since it covered up a serious problem.
Now, none of this is truly serious for the simple fact that I haven’t lost any data, and I’m thankful for that. I’m just glad I found this before any data loss did occur, because I’ve been lax in backing up to an alternate location. In fact. it’s vaguely amusing at how an easily fixed root cause (Crashplan running out of RAM to presumably stash file metadata) coupled with a over-simplified status report and ineffective monitoring created a much more serious effect.
And how CrashPlan can probably fix this:
- Quick fix: Catch the outofmemory errors, and tell the user or resolve it automatically.
It seems to be based on the size of the files that people are backing up, not the number of files, which to me makes me think that their block detection/hashing algorithm is what’s chewing up memory. I’d guess that 99% of the clients will probably never hi, say, 1.5TB of files.
But for the 1% that do, notify them instead of failing silently. I could have had the problem resolved quickly if I had been told about it. Missing a week (assuming weekly status reports) is far preferable to missing 4 and a half months.
Alternatively, where it’s just a flag in a command line, rewrite the flag automatically. UAC on Windows will cause some problems, but CrashPlan is writing to the Program Data folder perfectly fine, so moving the flags there should work. Some fancy scaling could be brought in (eg. limit it to no more than 1/2 the system RAM, up the size by 32MB each time you hit an OOM error) to make it even
- Long term fix: Drastically reduce memory usage.
I’m backing up ~3TB of files and it’s using just under a gig of RAM to hold everything. It sounds pretty unoptimal, especially considering that the error is a heap error (which tends to mean memory leaks somewhere). Can’t speculate much without knowing the internals, so I won’t.
And as far as I can tell there are no docs on the Crashplan website that explain how to up the memory limit.
So for the record,under Windows the config file is at
C:\Program Files\CrashPlan\CrashPlanService.ini, and at
/usr/local/crashplan/bin/run.conf on Linux.
You’re looking for the option “-Xmx256M” – I changed it to “-Xmx1536M”, you can change it to something and test it.
Tip for Windows users: Run your text editor as Administrator if you’re using Vista/7. We’re trying to edit a file in Program Files, UAC will silently block saves, which is incredibly annoying until you know what causes it.
Also, I thought they might at least standardize on a file location, if not name, where they’re using supposedly cross-platform Java. But nope!