I'm annoyed at Crashplan now

I’m annoyed at Crashplan now

I’ve been using crashplan for the last 10 months – the combination of low price, family pack of 10 computers and Windows & Linux support means it beat out BackBlaze for my online backup of choice. However, I’ve run into multiple problems with it, all in one day.

CrashPlan was seemingly stalling when trying to backup my latest set of pictures – It’d be stuck at “Analysing 2012-07-29 > _MG_6076.xmp” for a long time, before (seemingly) moving onto the next .xmp file. This was ridiculous, so I looked into why it was doing that, and found a symptom:

0. CrashPlan was starting & stopping for no rhyme or reason.

Spoiler: It was dying and being restarted by a watchdog of some sort.

I went looking for what could cause it to start & stop with such regularly, and found the first problem:

1. Their software is memory hungry. Ridiculously so.

By default it’s setup to use a maximum of 256MB of RAM. This is a hard limit imposed on the Java VM when it runs. I have it running on my media server, which has been specced with 1GB of RAM. It’d regularly hit the max, but it didn’t seem to have problems, so I chalked it up to the use of Java and left it at that. I’m not the only guy who’s noticed this: One guy has it hitting 1.5GB of RAM.

However, the hard limit led me to another problem:

2. The CrashPlan program crashes and burns

I’ve been getting weekly reports on my backups since I installed it. I quite enjoyed this because my media server is headless, so set-it-up-and-forget-it backups were awesome. I’d periodically go in via VNC and check up, and the desktop interface reported everything was just fine.

Except it wasn’t. It seems that beyond a certain number of files, Java hits the hard memory limit and dies:

[07.31.12 08:13:52.777 ERROR QPub-BackupMgr backup42.service.backup.BackupController] OutOfMemoryError occurred...RESTARTING! message=OutOfMemoryError in BackupQueue!

Because the file set changes very irregularly, once it starts crashing, it’ll continue to crash until you intervene and manually raise the memory limit. I raised the memory limit to 1.5GB, and Java’s only using 932MB, so there’s some headroom for growth if necessary.

In the configuration directory, I found a bunch of restart.log files while hunting for the file which defined the memory limits. Upwards of 260k of them . (I’m not kidding, I started a new SSH session to kill ls the first time I tried to list the directory because it was Taking. So. Long. I actually thought ls crashed.)

Each and every file seems to have been created when the CrashPlan Engine restarts. So that means CrashPlan ran out of memory and restarted at least 260 thousand times without me knowing.

Which leads me to the third problem:

3. Backups have at least one edge case where they’ll fail silently.

This is a screenshot of the most recent CrashPlan report that I got sent:

Spoiler: “Last Backup: 7 mins” is false. It might have connected 7 mins ago, but it hasn’t been backing up.

First is a laptop that hasn’t been connected, so I’ll ignore that. But helium is my media server, and that’s running just fine, right?

Nope. Backups have been failing since March 16, based on the first restart.log file I had.

I had an easy way to check – the main thing that I was backing up was my pictures, and those are sorted automatically into folders based on the date they were taken. So I pulled up the web browser to look at the restorable files after restarting CrashPlan.

My list of picture folders on the CrashPlan web interface

And promptly had a mini-freak out. There was a gap between 2012-03-13 and 2012-07-29. And not just because I hadn’t been taking photos that much.

Which means one thing: backups weren’t succeeding, but I was told everything was OK.

This was Not. Good. And I was Not. Impressed.

My “Last backup” times apparently meant “Last connected”. Which means:

4. The Backup status report is misleading

Ironically, the backup status report sample that CrashPlan has in their docs (dated June 15th 2009) fixes this problem.

I imagine people saw the two times and were confused, so CrashPlan merged the two times. In my case, that’s an oops since it covered up a serious problem.

Now, none of this is truly serious for the simple fact that I haven’t lost any data, and I’m thankful for that. I’m just glad I found this before any data loss did occur, because I’ve been lax in backing up to an alternate location. In fact. it’s vaguely amusing at how an easily fixed root cause (Crashplan running out of RAM to presumably stash file metadata) coupled with a over-simplified status report and ineffective monitoring created a much more serious effect.

And how CrashPlan can probably fix this:

Quick fix: Catch the outofmemory errors, and tell the user or resolve it automatically.

It seems to be based on the size of the files that people are backing up, not the number of files, which to me makes me think that their block detection/hashing algorithm is what’s chewing up memory. I’d guess that 99% of the clients will probably never hi, say, 1.5TB of files.

But for the 1% that do, notify them instead of failing silently. I could have had the problem resolved quickly if I had been told about it. Missing a week (assuming weekly status reports) is far preferable to missing 4 and a half months.

Alternatively, where it’s just a flag in a command line, rewrite the flag automatically. UAC on Windows will cause some problems, but CrashPlan is writing to the Program Data folder perfectly fine, so moving the flags there should work. Some fancy scaling could be brought in (eg. limit it to no more than 1/2 the system RAM, up the size by 32MB each time you hit an OOM error) to make it even

Long term fix: Drastically reduce memory usage.

I’m backing up ~3TB of files and it’s using just under a gig of RAM to hold everything. It sounds pretty unoptimal, especially considering that the error is a heap error (which tends to mean memory leaks somewhere). Can’t speculate much without knowing the internals, so I won’t.

And as far as I can tell there are no docs on the Crashplan website that explain how to up the memory limit.

So for the record,under Windows the config file is at C:\Program Files\CrashPlan\CrashPlanService.ini, and at /usr/local/crashplan/bin/run.conf on Linux.

You’re looking for the option “-Xmx256M” – I changed it to “-Xmx1536M”, you can change it to something and test it.

Tip for Windows users: Run your text editor as Administrator if you’re using Vista/7. We’re trying to edit a file in Program Files, UAC will silently block saves, which is incredibly annoying until you know what causes it.

Also, I thought they might at least standardize on a file location, if not name, where they’re using supposedly cross-platform Java. But nope!

backup, crashplan, sysadmin

This entry was posted on July 31, 2012, 4:22 pm by Kyle Lexmond and is filed under Sysadmin. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

#1 by Aubrey Bailey on August 2, 2012 - 10:27 pm

agreed. Once you get up around 3TB everything goes to hell. If you get a little past that it actually (repeatably with a clean install on ubuntu 10.04) will cease to communicate. apparently the more expensive versions dont have these problems as my university has a shared one with ~20TB

Quote

#2 by Kyle Lexmond on August 3, 2012 - 2:15 am

Wish I could say “Nah, that’s got to be some unrelated cause”, but I can totally believe that it’d seize up at >3TB.
I’m at 2.7TB right now, though I’m about to break it up into 2 separate systems thankfully.

#3 by Aubrey Bailey on August 3, 2012 - 3:01 am

I can only reproduce it on the one system, but I only have one system with >3TB to test. I have 3 other nodes in the same cloud. Their tech support is baffled and gave me a line about unsupported modifications. Its a fairly vanilla 10.04 upgraded to 11.04 through the dist-upgrade command. Other than that I’ve configured the hell out of apache.
The best part is that even archiving different subsets of the total 7TB, it breaks just before 3.5 every time before finishing the initial sync.
Maybe I’ll blow it all up and try again at some point, but put me down as “unsatisfied”.

#4 by Kyle Lexmond on August 6, 2012 - 4:56 pm

Jeez. That’s annoying, particularly where I’ll be likely to use more storage in the near future – it’ll be fairly easy for me to hit 3.5TB on one system now that I’m consolidating everything onto one system to move to uni.
At least I can handle editing the config files – can’t imagine what someone else would have to do. =|

#5 by Kyle on December 19, 2012 - 9:24 pm

I ran into this problem recently too. Under Crashplan Settings > Buckup > Advanced Settings there is an option to “Watch file system in real time”. It seems to me that the problem either has to do with the number of files being backed up, or the total size of the backup (perhaps both) that causes this issue. Any insight as to whether disabling real-time file system monitoring could curb this problem?

#6 by Kyle Lexmond on December 20, 2012 - 12:10 am

Well, disabling the real-time watch had some effect – my memory usage dropped by something like 100MB I think. Total memory usage is still off the charts insane though – I was seeing ~2.5GB used, sadly I didn’t take a screenshot.

I’m increasingly sure memory usage is a function of the number of files, not the size of files, based on the fact that the memory usage dropped a fair bit when I compressed a bunch (~50k) of old archive files into a single zip file. (By a few hundred MB if I’m remembering correctly, since I remember being happy to bring it down to ~1.5GB used.)

#7 by xYZ on December 21, 2012 - 7:28 am

I’ve got around 2.7T uploaded. Notorious memory leaks. ~1Mbps upload speed (on 50Mbps link).

So I’d say a soft limit on this “unlimited”-hah service is ~2T.

#8 by AgentX on January 1, 2013 - 6:13 pm

Thank you so much for this blog post, it solved the issue I was having with CrashPlan silently dying and refusing to backup past 94% on my 1.5TB data set. I adjusted the memory limit and watched the memory usage in task manager jump over the previous limit of 512MB (presumably where it was dying before).

Again, thanks. Is there anyway to get this information to CrashPlan so the fix can be a little more accessible?

#9 by Kyle Lexmond on January 2, 2013 - 1:43 pm

I’m glad it helped!

As for CrashPlan… Well, they know about it – I opened a ticket with them, and they replied with practically the same information. I don’t know why they don’t have it in their support site since so many people seem to be having the same problem – and their response seemed very copy-and-pasted too. Searching “Xmx site:support.crashplan.com” just gave a single link to a guide on stopping and restarting the backup engine.

(Yep, it’s copy-and-pasted. The email I got was the same as the email another guy got, with the exception of Windows vs Mac.

#10 by trk on January 9, 2013 - 5:21 am

G’day,

Glad I found this post. I was the same as you – every so often I’d get an email telling me how great the backups were going so I figured things were going swimmingly. Then I happened to notice that the percentage done hadn’t increased in months… uh oh.

The annoying part is that its on a headless box sitting in a corner so I never had the tray icon to do a quick check. Ended up using Xmimg and putty with X11 forwarding to get a pretty GUI on my Windows desktop to check it over. Between that and tailing the service.log.0 file I saw the error you mentioned as it restarted it scanning – Java was running out of memory and crashing. I already have real time file watch disabled (and other “tweaks” mentioned on the Crashplan Pro support site) but recently made a backup of a website that handles lots of small files which I think pushed the memory usage over the edge.

Used your suggestion (but with -Xmx2048m.. I’ve got 8GB of RAM in that box to play with so a bit more wont hurt) and it’s happily syncing my ~2TB / ~2 million file count backup again.

Thanks for taking the time to post this, it helped a lot.

PS – My run.conf had -Xmx512m as default (which clearly wasnt enough)

#11 by Pavel Uhliar on February 18, 2013 - 4:26 pm

Hi, guys

Thanks for info in this thread. It seems to be same problem for me, but I seem to hit it with just below 600GB, 92k files (cca 82% of my photos backup).

#12 by Torleif on March 10, 2013 - 3:56 pm

According to their customer support there is a native client on the way which will hopefully have a lot better memory usage than the current Java thing 🙂

#13 by Pavel Uhliar on March 13, 2013 - 6:26 pm

Great news, I am becoming tired of having 4G of RAM taken just after windows startup :/

#14 by me on March 14, 2013 - 10:14 pm

I can confirm that increasing the memory allocation to 1.5G or more resolves most of the issues with Crashplan stuck and/or crashing somewhere around 400K files (for me). The low memory issue also resulted in very high cpu usage (on a dual quad core Xeon Poweredge server with 16GB of RAM), more than most java apps i’ve seen before.

#15 by Guillaume Paumier on May 2, 2013 - 2:30 pm

Hi. I found your post while re-investigating the issue.

I encountered the same problem last year (CPU cycling as the engine was crashing and restarting over and over) and the CrashPlan Support guys told me how to up the memory limit. It worked very well so I was happy.

Until now. I’ve reached ~4.4 TB of data backed up, but now no matter how much I up the memory limit, it won’t do. It crashes a few minutes at most after it starts. This is very annoying.

If it’s true that a more efficient native client is on the way, I’m looking forward to it, because right now I’m pretty much stuck.

#16 by Mark Laing on May 4, 2013 - 5:04 pm

Superb tip – worked like a charm. Thanks again for posting!

Mark

#17 by Don on September 9, 2013 - 9:24 am

Have done all that – my problem is slightly different: VERY high CPU usage — particularly when I’m away, but even when present. So much so that it interferes with my work by pausing almost everything for long periods. Have limited CPU to 10% when user is present, 60% when away. Backups run fine (though it’s not big < 100 MB there are many files — accounting programs make a LOT of tiny files!).

It's all about hogging the CPU and RAM (8 GB on the box, 2GB given to Crashplan as above). When I connect via Terminal Services/Remote Desktop I have to "sleep" Crashplan to make the machine usable.

Any ideas?

#18 by Adrian N. on September 29, 2013 - 7:02 pm

What do you think it’s the approach to try to make crashplan work on a headless client (Qnap) with only 256mb of ram installed, other than replacing it for another NAS?

Thanks

#19 by Kyle Lexmond on September 29, 2013 - 10:40 pm

A lot of swap space would probably do the trick. I’d guess that CP doesn’t actively touch stuff (hashes I guess) that it creates in memory, so the OS would be able to swap stuff out to disk.

#20 by Mike on November 5, 2013 - 2:16 pm

I’m running the windows client, with just over 346GB, and it was taking 400Megs of RAM. Contacted support, and was told a new native client is coming. So it’s only been a year. I guess I’ll stick with Carbonite.

#21 by yuri on December 25, 2013 - 6:59 pm

I have about 4TB to back up. About two weeks ago I started seeing the restarts. Support article recommends to allocate 1GB memory for every TB of back up, hence my xmx value should be 4096MB. (I have 32GB RAM, BTW) . I opened .ini file and changed the value. When I tried to start the service, it starts and immediately stops… I decreased the value until I can restart the service…. For me it’s 1536M…..
I guess I will start to look for alternative backup solution…

#22 by Jay Heiser on March 26, 2014 - 7:24 pm

Let’s start by saying that I’ve got 8Gigs of RAM on my Windows 7 (64bit) machine, and my net bandwidth usually measures at about 60Mbs uplink.

I tested CrashPlan for a couple days, and it seemed to chunk right along about 5Mbs, which wasn’t taking advantage of my full bandwidth, but was fast enough to be useful. So I sent for the Seed drive, filled it up to the max 1 TB, and sent it back.

Someone installed my data, notified me it was ready, and then left for 2 days. After her return, and after wasting 15 hours of my time doing a local chkdsk, they figured out what they hadn’t done right, and it started in on the remaining 600 Megs of my data.

Late Fri night, it comes to a complete halt. So I ask for help, they provide me a suggestion to improve memory (remember, I’ve got 8Gig). He goes away for 2 days, and I find that now I can’t even run their client.

We spend two days dicking around trying to get the client working, with the final solution being reinstall it.

Now I’m back to the Crashplan saying that it has completed the backup/100,997 files left. It seems that my support guy is gone again.

Maybe online backup of 1.6TB of pictures is unrealistic with today’s technology? Its not for the lack of bandwidth or system memory on my side.

I’ve spent 22 days to get 2/3 of my data uploaded, and I’ve spent hours dicking with this, and getting half-assed support from 2 people who are either over-worked, or just don’t pay attention.

Has anybody failed with CrashPlan, but succeeded with BackBlaze?

#23 by Cuong on August 20, 2014 - 7:41 pm

We’ve had Crashplan installed by an IT company on an XP machine to back up our Synology NAS to their server, while they monitor status emails for us. We have only 60G of data.

Everything went well and they got OK messages everyday. But recently our NAS got hijacked (Synolocker, everything was encrypted) and we thought it would be a matter of just restoring from the backup.

However, as it turns out from the backup log, the backup scan got messed up about two months ago and since then it backs up almost no data at all. Although all our shares are selected for backup, every rescan results in only 1.9G of data, the rest is simply ignored.

So we have lost all data from the last 2 months, a lot of work down the drain.

Crashplan goes down the same drain and we are looking for another solution.

#24 by Aubrey on August 20, 2014 - 8:56 pm

Well, to be fair, you had a 2 year old error. Also, you are running an unsupported, unpatchable OS. This is mostly your fault. I say mostly because honestly the jre memory thing is pretty silly.
Seriously, go reconsider your IT choices.

#25 by ScarlettFeverr on June 1, 2016 - 3:41 am

Three years later and Crashplan still does the silent fails. They really need to get their shit together and have a simple way for users to fix this instead of having to dig through code. In the year 2016 and with the amount of competition they have, it’s just absurd.

#26 by Matthew Taylor on July 11, 2016 - 2:15 pm

I’ve been seeing the same problem with about 1.3Tb on a 32bit Windows server.

If I have the memory set low, then Crashplan just continually restarts itself, with the log files being the only real giveaway of what is happening. If I set the memory high, then I get messages that service started and then immediately stopped. I’ve yet to find a happy medium between these two problems.

It used to work fine and I don’t recall adding huge numbers of new files to start it having this issue…

#27 by Sotiris on July 13, 2016 - 12:26 pm

Yesterday I had a CP instance totally fail on a machine with 4 GB RAM. As soon as it reached about 700MB of memory the service would restart. I changed the allocated memory to 1536 MB and the service wouldn’t start at all! I decided to install the 64-bit version (with default memory allocation at 1GB) and things went well until it reached about 100k files, where scanning slowed from 1000 files / sec to 100 files /sec. I’m getting tired of experimenting with this. Unfortunately, competition has higher prices (CP has this plan for 10 PCs at $150, while Backblaze, for example, goes for $50 for each PC). Any suggestions?

#28 by Kyle Lexmond on July 17, 2016 - 3:56 am

Unfortunately, no. Crashplan remains best-in-class for price per computer coverage.

That said, I’m wondering if it might change in the future – Crashplan recently added a new “You must connect your computer to Crashplan every 6 months” ‘feature’, so changes to the family plan are possible in the near future. Also, they removed the multi-year purchase plan as well.

#29 by Ford on August 22, 2016 - 1:30 pm

So any suggestions for alternatives?

#30 by Kyle Lexmond on August 25, 2016 - 8:21 am

Backblaze if you’ve got <= 4 systems. Their pricing is $50/computer/year, Crashplan's is $150/year for 2-10 systems. I consider not using their app worth $50/year.Unfortunately, nothing else comes close to Crashplan's pricing for 5+ systems, unless you roll your own backups, and that doesn't have unlimited storage.

#31 by Matthew Taylor on August 25, 2016 - 10:17 am

The think that backblaze (and most others) are lacking which crashplan has is the ability to also backup to local drives and other local computers, allowing you multiple levels of backup more easily, particularly if your internet connection is slow and lags behind the local backup sometimes.
This can be achieved in other ways, but most alternatives involve using two separate backup programs (one local and one cloud), rather than combining it into one.

#32 by Will on July 13, 2017 - 6:59 pm

We’re in teh second half of 2017. This is still an issue. Crashplan is a huge headache especially on a headless system but there’s no other alternative. This application is completely stuck in the past. They really screw their heaviest users. The more data you have, the higher the overhead and the greater risk of a larger data loss. This is a nightmare.

nTh among all