Introduction
Many Rails applications require periodic background processes to perform work outside of the Rack environment processing web and API requests. On bare-metal servers and virtual machines, those background jobs are easily scheduled with the Operating System's cron
system, but what can we do in container environments where cron
is not available? While working on KUY.io Konnect we were looking for a robust, flexible and batteries-included option for scheduling periodic and recurring background tasks such as synchronizing the user database with an LDAP server, or collecting system and network statistics. As it turns out there are a number of great Ruby native solutions available that integrate really well with any process manager like foreman
or overmind
that you are probably already using in your containerized Rails applications.
Table of Contents
One of the greatest things about Rails is the enormous ecosystem of well maintained, battle-proven and deployed-in-production solutions to just about any problem that you can think of. Actually, Rails features some of the best background job processing frameworks out there. Let's take a look at how they support periodic background processes and how we can use that functionality in containerized Rails applications.
Sidekiq
Sidekiq is the de-facto standard for high-performance background job processing for Ruby. Since it holds the job queue in a Redis store, Sidekiq requires a separate Redis
deployment that needs to run alongside your application. Sidekiq also includes a dashboard, paid versions for commercial deployments, and a number of helpful plugins. One such plugin for periodic execution of background jobs is the sidekiq-scheduler
gem published in this repository. It allows to define periodic jobs in a .yml
file with a syntax closely resembling cron
.
# config/sidekiq_schedule.yml
hello_world:
cron: '0 * * * * *' # Runs once per minute
queue: default
class: HelloWorld
description: 'Says Hello'
With the schedule defined, we can load and execute the schedule every time the app starts up by placing the scheduling code in an initializer, like config/initializers/sidekiq_schedule.rb
# config/initializers/sidekiq_schedule.rb
require 'sidekiq'
require 'sidekiq-scheduler'
Sidekiq.configure_server do |config|
config.on(:startup) do
Sidekiq.schedule = YAML.load_file(File.expand_path('../sidekiq_schedule.yml', __FILE__))
SidekiqScheduler::Scheduler.instance.reload_schedule!
end
end
A great feature of sidekiq-scheduler
is the support for dynamic schedules, which are super useful when we don't know the periodic jobs or their execution frequencies ahead of time, for instance, if background jobs are spawned off of user interactions in the web UI. To modify a schedule after application startup, we can simply do:
Sidekiq.set_schedule('heartbeat', { 'every' => ['1m'], 'class' => 'HeartbeatWorker' })
Note: the dynamic scheduling feature requires the
:dynamic
flag to be set totrue
in ourconfig/sidekiq.yml
file, otherwise we need to manually reload the schedule after dynamically modifying the schedule:
SidekiqScheduler::Scheduler.instance.reload_schedule!
How to use it with containerized Rails apps
One of the great advantages of sidekiq
and sidekiq-scheduler
is that there are no differences in use between bare-metal / virtual machine deployments and containerized Rails apps. We simply define the sidekiq
workers in our Procfile
the same way as we would in a non-containerized app, and sidekiq-scheduler
through our initializer will start pushing the defined jobs into the sidekiq
job queues automatically.
# Procfile
web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec sidekiq
Resque
Resque is another popular and mature background job processing framework on top of a Redis
backend. It also sports a plugin, called resque-scheduler
published in this repository that allows you to define a schedule for recurring jobs in a .yml
file.
# config/resque_schedule.yml
cancel_abandoned_orders:
cron: "*/5 * * * *"
class: "CancelAbandonedOrders"
description: 'This job cancels all abandoned orders'
queue: cleanup
Working off the schedule can be done either through a rake
task, or with the stand-alone executable resque-scheduler
that is installed with the gem.
# lib/tasks/resque_scheduler.rb
# Resque tasks
require 'resque/tasks'
require 'resque/scheduler/tasks'
namespace :resque do
task :setup do
require 'resque'
# you probably already have this somewhere
Resque.redis = 'localhost:6379'
end
task :setup_schedule => :setup do
require 'resque-scheduler'
Resque::Scheduler.dynamic = true
Resque.schedule = YAML.load_file(Rails.root.join('config/resque_schedule.yml'))
end
task :scheduler => :setup_schedule
end
In addition to scheduled defined ahead of time in a .yml
file, resque-scheduler also allows to dynamically modify schedules programatically from the Rails application, for example, as a result of a user interaction. We love this feature, and make heavy use of it in our uptime monitoring and status reporting software Heimdall, where we allow users to dynamically create infrastructure health checks through a web interface.
name = 'send_emails'
config = {
class: 'SendEmail',
args: 'POC email subject',
every: ['1h', {first_in: 5.minutes}],
}
Resque.set_schedule(name, config)
How to use it with containerized Rails apps
With resque
and resque-scheduler
, there are no differences in use between bare-metal / virtual machine deployments and containerized Rails apps. We simply define the resque
workers in our Procfile
the same way as we would in a non-containerized app, and add a resque-scheduler
worker to the Procfile
:
# Procfile
web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec rake resque:work
scheduler: bundle exec rake resque:scheduler
# or scheduler: bundle exec resque-scheduler
GoodJob
GoodJob is a relatively new background job processing framework. Unlike Sidekiq
or Resque
, it uses a Postgres database as the backing store to queue jobs. It has full support for the latest ActiveJob
features such as async
, queues
, delays
, priorities
, timeouts
and retries
, and adopts concurrency with Concurrent:Ruby
. To check out more of the story behind good job, you can read this introductory blog post. GoodJob recently added native support for cron-style repeating and recurring jobs without the need for an extra plugin.
Unlike other processing frameworks, repeating and recurring jobs are not specified in a separate .yml
file, but instead directly in the environment config fo your Rails app.
# config/environments/application.rb or a specific environment e.g. production.rb
# Enable cron in this process; e.g. only run on the first Heroku worker process
config.good_job.enable_cron = true
# Configure cron with a hash that has a unique key for each recurring job
config.good_job.cron = {
# Every 15 minutes, enqueue `ExampleJob.set(priority: -10).perform_later(52, name: "Alice")`
frequent_task: { # each recurring job must have a unique key
cron: "*/15 * * * *", # cron-style scheduling format by fugit gem
class: "ExampleJob", # reference the Job class with a string
args: [42, { name: "Alice" }], # arguments to pass; can also be a proc
set: { priority: -10 }, # additional ActiveJob properties; can also be a lambda/proc
description: "Something helpful", # optional description
},
another_task: {
cron: "0 0,12 * * *",
class: "AnotherJob",
},
# etc.
}
How to use it with containerized Rails apps
With GoodJob
there are no differences in use between bare-metal / virtual machine deployments and containerized Rails apps. We simply define the good_job
worker in our Procfile
the same way as we would in a non-containerized app:
# Procfile
web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec good_job start
DelayedJob
Probably the oldest background job processing framework for the Ruby ecosystem is DelayedJob. It was extracted from Shopify where it was doing the heavy lifting for a multitude of core tasks such as image resizing, downloads, batch imports, newsletter campaigns, or search index updates. DelayedJob can be configured by using plugins to use different backing stores, such as ActiveRecord or MongoDB.
Unlike the other background job processing frameworks described in this article, DelayedJob
doesn't have direct support for recurring or repeating jobs. However, using the clockwork gem, we can retrofit cron-style scheduling of jobs with DelayedJob
. Clockwork acts as a full cron replacement: it runs as a lightweight, long-running Ruby process which sits alongside our web and worker processes (DJ/Resque/Minion/Stalker) to schedule recurring work at particular times or dates.
We simply define our job schedule in a clock.rb
file:
require 'clockwork'
require './config/boot'
require './config/environment'
module Clockwork
every(10.seconds, 'system.stats') do
SystemStatsJob.perform_now
end
every(5.minutes, 'ldap.sync') do
LDAPSyncJob.perform_now
end
end
We can then run the clockwork
process, responsible for queuing jobs at the defined intervals, by adding it to our Procfile
:
# Procfile
web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec rails jobs:work
clockwork: bundle exec clockwork clock.rb
Summary
We have seen that adding support for repeating and recurring background job processing is straightforward with most of the popular background job processing frameworks for Ruby and Rails. Yet, there are a few things to keep in mind when executing background jobs in containerized application environments:
- Containers are ephemeral. Depending on the container orchestration platform, containers can be created, terminated or rescheduled at any time. We need to consider that our scheduler and worker processes can be interrupted in the middle of processing, and that there can be multiple copies of the worker processes running simultaneously. As a rule of thumb, we want to make sure to only ever run a single
scheduling
process, responsible for queueing the background jobs to be worked on by one or more workers. - Ditch cron for Clockwork. Cron jobs are spawned by the OS and execute with their own execution environment and configuration. We are always better off at least running a
Clockwork
process instead that can reuse our Rails environment and configuration natively. - Process Managers are our Friends. Responsible for spawning, monitoring, and keeping alive multiple processes that make up the entire application stack, process managers such as foreman, hivemind, or overmind are the ideal companions for containerized Rails applications. With orchestration platforms like Heroku, Deis or Dokku offering native
Procfile
support and fine-grained application scaling per process executing scheduler processes alongside our worker and web processes make total sense.
How can we help?
If you have any questions about KUY.io or are looking for advice with implementing background job processing workloads in your next project, please feel free to reach out us!
Nicolas Bettenburg, CEO and Founder of KUY.io