We are using Delayed Jobs to support some background processing. https://github.com/collectiveidea/delayed_job It has been pretty OK overall, mostly because the jobs die sometimes and we have limited control over getting them running b/c we work in a Financial Services industry and our audit/protocols have locked us out of our production environment (sucks).
We have monit installed, but it doesn’t seem to be working properly. https://mmonit.com/monit/
So, I dug in on this one and wrote a bash script that does the following on deployment
- First we shutoff monit and kill and delayed job workers. They are running an old instantiation of the code.
- Then we feed permissions for our apache user to run delayed job commands as root (monit runs them as root) so we can control the jobs via the application
- Then for the # of workers we want, we create a pidfile process monitor for each one.
- I added and uptime > 48hours check to this to keep things sane in the event our workers live too long. We have a job that tries to cycle them every 8 hours, but it travels through a load balancer so we can’t be sure it will work.
- I should also note we have a multi-instance scaling group in AWS so we’ll never (very very very rarely) have ALL of our jobs cycle at once (we have > 3 servers)