On numerous occasions I have lamented both the design and the typical usage of cron to friends and other geeks. My main gripes:
- jobs are not protected against running concurrently (design issue)
- intervals are fixed (design issue)
- people tend to schedule the same job at the same time on whole clusters of machines (typical usage issue)
The first issue can be fixed with tools like lockfile
and setlock
—
a lot of red tape for something that should be a default feature.
The second and third issue are closely related in that both cause undesirable load spikes because many things happen at the same time, either because of intervals phasing up or because of similar jobs running on a bunch of machines at the same time, perhaps hammering the network.
A specific pet peeve is Mailman Reminder Day. Firstly, I just don’t see the point; if my address is on a mailing list and that list is practically dead, I just don’t care. Secondly, it means every first of the month
I have tens of reminders that I just delete. Some of these lists are busy — for those, the reminder is a minor nuisance. But many lists are extremely quiet (think software release announcements) and for some of these lists, the reminders are over 50% of the total mail volume. It’s so wasteful. Also, I can’t help but think that all those reminders being sent out at the same time (well, divided over 24 hours because of time zone differences) cannot be good for the mail ecosystem as a whole.
For issues two and three, Colm MacCárthaigh wrote a few great posts detailing why cron is bad, and showcasing one potential solution to the issues at hand. I suggest reading these posts fully, they are very insightful:
This post was inspired by my pet peeve about Mailman and about jobs running in parallel unintendedly; this post was triggered by Job Snijders pointing me to another interesting post; Colm’s posts above were referred to in the comments.