While trying to diagnose a problem with my VMs (namely, why starting a fedora 16 based vm fails to bring up the network connection), I ran into a strange issue – on my dom0 and some of the domUs, logrotate hasn’t been running, leaving me with insanely long logfiles!
So I set out trying to understand where the problem came from. Both the f16-based dom0 and domU had old logs, but my f14 domU was still working. That pointed to an update failing. And because logrotate never ran, I’ve got the yum logs of what was installed.
logrotate by default is setup to append the date to the filename, so I knew when logrotate was last run – 20111127. So anything between 27th November and 4th December could be the culprit. To start with, I looked at cron in particular, since that’s what supposed to run logrotate.
When I checked in my dom0, I got this:
[root@elemental log]# grep cron yum.log Dec 01 00:23:46 Updated: cronie-anacron-1.4.8-2.fc14.x86_64 Dec 01 00:23:48 Updated: cronie-1.4.8-2.fc14.x86_64 Dec 01 01:38:54 Updated: cronie-anacron-1.4.8-2.fc15.x86_64 Dec 01 01:38:56 Updated: cronie-1.4.8-2.fc15.x86_64 Dec 01 01:38:58 Updated: crontabs-1.11-2.20101115git.fc15.noarch Dec 01 03:05:41 Updated: cronie-anacron-1.4.8-10.fc16.x86_64 Dec 01 03:05:43 Updated: cronie-1.4.8-10.fc16.x86_64
Which showed me that I upgraded my dom0 to f16 on December 1st. Which is kinda helpful – I did upgrade cronie and cronie-anacron. Also, crontabs was upgraded, but only in F15. Then I looked at my domU. Again, judging from the dates in /var/log, I was looking for something between November 7 – 13. And lo and behold, I found lines that looked exactly like those in my dom0:
[root@lithium log]# grep cron yum.log Aug 20 15:20:05 Updated: cronie-1.4.8-2.fc14.x86_64 Aug 20 15:20:05 Updated: cronie-anacron-1.4.8-2.fc14.x86_64 Nov 07 18:33:50 Updated: cronie-anacron-1.4.8-2.fc15.x86_64 Nov 07 18:33:54 Updated: cronie-1.4.8-2.fc15.x86_64 Nov 07 18:33:54 Updated: crontabs-1.11-2.20101115git.fc15.noarch Nov 07 20:50:42 Updated: cronie-anacron-1.4.8-10.fc16.x86_64 Nov 07 20:50:44 Updated: cronie-1.4.8-10.fc16.x86_64
Except the F14 update was done in August, before the problem started. So, hmm, maybe the upgrade to Fedora 15 broke it. It would certainly explain why the F14 domU is still working fine. And I just happened to have an F15 domU that I had yet to upgrade to F16.
But it was strange – when I checked it, logrotate was working fine. /var/log had stuff timestamped 20120429, so the upgrade to F15 couldn’t have been the problem. logrotate kept an old version of yum.log, so I was able to look for cron related entries:
[root@sodium log]# grep cron yum.log* yum.log:Apr 19 13:16:38 Updated: cronie-anacron-1.4.8-4.fc15.x86_64 yum.log:Apr 19 13:16:40 Updated: cronie-1.4.8-4.fc15.x86_64 yum.log-20120101:Sep 12 18:54:29 Updated: cronie-anacron-1.4.8-2.fc15.x86_64 yum.log-20120101:Sep 12 18:54:30 Updated: cronie-1.4.8-2.fc15.x86_64
Which wasn’t very helpful – I was using virtually the exact same versions that I had upgraded my F16 systems to – 1.4.2. So it wasn’t the F14 -> F15 upgrade. So maybe it was the F15 -> F16 upgrade?
I puzzled over this, then I saw a log named ‘cron’. Which turned out to really be key:
[root@elemental log]# tail -n 5 cron
Dec 1 01:39:01 elemental crond[2011]: (*system*) RELOAD (/etc/cron.d/0hourly)
Dec 1 01:41:01 elemental crond[2011]: (*system*) RELOAD (/etc/cron.d/smolt)
Dec 1 01:43:01 elemental crond[2011]: (*system*) RELOAD (/etc/cron.d/sysstat)
May 1 23:49:29 elemental crontab[9780]: (root) BEGIN EDIT (root)
May 1 23:49:41 elemental crontab[9780]: (root) END EDIT (root)
The crontab made sense – I run crontab -e to see if logrotate was in root’s crontab (which it wasn’t), but the crond entries were interesting – for one, it showed me the exact time it stopped, but also told me that cron has a daemon. And guess what was part of the F15 -> F16 upgrade? Moving stuff to the new systemd.
So I quickly checked crond’s status on a F16 machine:
[root@elemental log]# systemctl status crond.service crond.service - Command Scheduler Loaded: loaded (/lib/systemd/system/crond.service; disabled) Active: inactive (dead) CGroup: name=systemd:/system/crond.service
Which, oops, that looks bad. But the F15 machine looked a lot better:
[root@sodium log]# systemctl status crond.service crond.service - Command Scheduler Loaded: loaded (/lib/systemd/system/crond.service) Active: active (running) since Mon, 30 Apr 2012 14:16:01 +0800; 1 day and 10h ago Main PID: 831 (crond) CGroup: name=systemd:/system/crond.service â 831 /usr/sbin/crond -n
And finally the F14 machine looked like this:
[root@helium init.d]# service crond status crond (pid 1118) is running...
So it seems the fix is to enable crond. I have no clue why crond would be disabled between F15 -> F16 upgrades, but I’m seeing it on two different F16 systems, both of which were upgraded from F15. I might do a clean install of F16 and look, as well as try an F15 -> F16 upgrade, but that’s for later.
For now, I’m doing a systemctl enable crond.service
followed by systemctl start crond.service
on the three F16 systems and calling it a day.