Executing Periodic Background Jobs In Dockerized Rails Applications

Introduction

Many Rails applications require periodic background processes to perform work outside of the Rack environment processing web and API requests. On bare-metal servers and virtual machines, those background jobs are easily scheduled with the Operating System's cron system, but what can we do in container environments where cron is not available? While working on KUY.io Konnect we were looking for a robust, flexible and batteries-included option for scheduling periodic and recurring background tasks such as synchronizing the user database with an LDAP server, or collecting system and network statistics. As it turns out there are a number of great Ruby native solutions available that integrate really well with any process manager like foreman or overmind that you are probably already using in your containerized Rails applications.

Introduction
Sidekiq
Resque
GoodJob
DelayedJob
Summary
How can we help?

One of the greatest things about Rails is the enormous ecosystem of well maintained, battle-proven and deployed-in-production solutions to just about any problem that you can think of. Actually, Rails features some of the best background job processing frameworks out there. Let's take a look at how they support periodic background processes and how we can use that functionality in containerized Rails applications.

Sidekiq

Sidekiq is the de-facto standard for high-performance background job processing for Ruby. Since it holds the job queue in a Redis store, Sidekiq requires a separate Redis deployment that needs to run alongside your application. Sidekiq also includes a dashboard, paid versions for commercial deployments, and a number of helpful plugins. One such plugin for periodic execution of background jobs is the sidekiq-scheduler gem published in this repository. It allows to define periodic jobs in a .yml file with a syntax closely resembling cron.

# config/sidekiq_schedule.yml

hello_world:
    cron: '0 * * * * *'   # Runs once per minute
    queue: default
    class: HelloWorld
    description: 'Says Hello'

With the schedule defined, we can load and execute the schedule every time the app starts up by placing the scheduling code in an initializer, like config/initializers/sidekiq_schedule.rb

# config/initializers/sidekiq_schedule.rb

require 'sidekiq'
require 'sidekiq-scheduler'

Sidekiq.configure_server do |config|
  config.on(:startup) do
    Sidekiq.schedule = YAML.load_file(File.expand_path('../sidekiq_schedule.yml', __FILE__))
    SidekiqScheduler::Scheduler.instance.reload_schedule!
  end
end

A great feature of sidekiq-scheduler is the support for dynamic schedules, which are super useful when we don't know the periodic jobs or their execution frequencies ahead of time, for instance, if background jobs are spawned off of user interactions in the web UI. To modify a schedule after application startup, we can simply do:

Sidekiq.set_schedule('heartbeat', { 'every' => ['1m'], 'class' => 'HeartbeatWorker' })

Note: the dynamic scheduling feature requires the :dynamic flag to be set to true in our config/sidekiq.yml file, otherwise we need to manually reload the schedule after dynamically modifying the schedule:

SidekiqScheduler::Scheduler.instance.reload_schedule!

How to use it with containerized Rails apps

One of the great advantages of sidekiq and sidekiq-scheduler is that there are no differences in use between bare-metal / virtual machine deployments and containerized Rails apps. We simply define the sidekiq workers in our Procfile the same way as we would in a non-containerized app, and sidekiq-scheduler through our initializer will start pushing the defined jobs into the sidekiq job queues automatically.

# Procfile

web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec sidekiq

Resque

Resque is another popular and mature background job processing framework on top of a Redis backend. It also sports a plugin, called resque-scheduler published in this repository that allows you to define a schedule for recurring jobs in a .yml file.

# config/resque_schedule.yml

cancel_abandoned_orders:
    cron: "*/5 * * * *"
    class: "CancelAbandonedOrders"
    description: 'This job cancels all abandoned orders'
    queue: cleanup

Working off the schedule can be done either through a rake task, or with the stand-alone executable resque-scheduler that is installed with the gem.

# lib/tasks/resque_scheduler.rb

# Resque tasks
require 'resque/tasks'
require 'resque/scheduler/tasks'

namespace :resque do
  task :setup do
    require 'resque'

    # you probably already have this somewhere
    Resque.redis = 'localhost:6379'
  end

  task :setup_schedule => :setup do
    require 'resque-scheduler'

    Resque::Scheduler.dynamic = true
    Resque.schedule = YAML.load_file(Rails.root.join('config/resque_schedule.yml'))
  end

  task :scheduler => :setup_schedule
end

In addition to scheduled defined ahead of time in a .yml file, resque-scheduler also allows to dynamically modify schedules programatically from the Rails application, for example, as a result of a user interaction. We love this feature, and make heavy use of it in our uptime monitoring and status reporting software Heimdall, where we allow users to dynamically create infrastructure health checks through a web interface.

name = 'send_emails'
config = {
    class: 'SendEmail',
    args: 'POC email subject',
    every: ['1h', {first_in: 5.minutes}],
}
Resque.set_schedule(name, config)

How to use it with containerized Rails apps

With resque and resque-scheduler, there are no differences in use between bare-metal / virtual machine deployments and containerized Rails apps. We simply define the resque workers in our Procfile the same way as we would in a non-containerized app, and add a resque-scheduler worker to the Procfile:

# Procfile

web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec rake resque:work
scheduler: bundle exec rake resque:scheduler
# or scheduler: bundle exec resque-scheduler

GoodJob

GoodJob is a relatively new background job processing framework. Unlike Sidekiq or Resque, it uses a Postgres database as the backing store to queue jobs. It has full support for the latest ActiveJob features such as async, queues, delays, priorities, timeouts and retries, and adopts concurrency with Concurrent:Ruby. To check out more of the story behind good job, you can read this introductory blog post. GoodJob recently added native support for cron-style repeating and recurring jobs without the need for an extra plugin.

Unlike other processing frameworks, repeating and recurring jobs are not specified in a separate .yml file, but instead directly in the environment config fo your Rails app.

# config/environments/application.rb or a specific environment e.g. production.rb

# Enable cron in this process; e.g. only run on the first Heroku worker process
config.good_job.enable_cron = true

# Configure cron with a hash that has a unique key for each recurring job
config.good_job.cron = {
  # Every 15 minutes, enqueue `ExampleJob.set(priority: -10).perform_later(52, name: "Alice")`
  frequent_task: { # each recurring job must have a unique key
    cron: "*/15 * * * *", # cron-style scheduling format by fugit gem
    class: "ExampleJob", # reference the Job class with a string
    args: [42, { name: "Alice" }], # arguments to pass; can also be a proc
    set: { priority: -10 }, # additional ActiveJob properties; can also be a lambda/proc
    description: "Something helpful", # optional description
  },
  another_task: {
    cron: "0 0,12 * * *",
    class: "AnotherJob",
  },
  # etc.
}

How to use it with containerized Rails apps

With GoodJob there are no differences in use between bare-metal / virtual machine deployments and containerized Rails apps. We simply define the good_job worker in our Procfile the same way as we would in a non-containerized app:

# Procfile

web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec good_job start

DelayedJob

Probably the oldest background job processing framework for the Ruby ecosystem is DelayedJob. It was extracted from Shopify where it was doing the heavy lifting for a multitude of core tasks such as image resizing, downloads, batch imports, newsletter campaigns, or search index updates. DelayedJob can be configured by using plugins to use different backing stores, such as ActiveRecord or MongoDB.

Unlike the other background job processing frameworks described in this article, DelayedJob doesn't have direct support for recurring or repeating jobs. However, using the clockwork gem, we can retrofit cron-style scheduling of jobs with DelayedJob. Clockwork acts as a full cron replacement: it runs as a lightweight, long-running Ruby process which sits alongside our web and worker processes (DJ/Resque/Minion/Stalker) to schedule recurring work at particular times or dates.

We simply define our job schedule in a clock.rb file:

require 'clockwork'
require './config/boot'
require './config/environment'

module Clockwork
  every(10.seconds, 'system.stats') do
    SystemStatsJob.perform_now
  end

  every(5.minutes, 'ldap.sync') do
    LDAPSyncJob.perform_now 
  end
end

We can then run the clockwork process, responsible for queuing jobs at the defined intervals, by adding it to our Procfile:

# Procfile
web: bundle exec rails server -b 0.0.0.0 -p ${PORT:-5000}
worker: bundle exec rails jobs:work
clockwork: bundle exec clockwork clock.rb

Summary

We have seen that adding support for repeating and recurring background job processing is straightforward with most of the popular background job processing frameworks for Ruby and Rails. Yet, there are a few things to keep in mind when executing background jobs in containerized application environments:

Containers are ephemeral. Depending on the container orchestration platform, containers can be created, terminated or rescheduled at any time. We need to consider that our scheduler and worker processes can be interrupted in the middle of processing, and that there can be multiple copies of the worker processes running simultaneously. As a rule of thumb, we want to make sure to only ever run a single scheduling process, responsible for queueing the background jobs to be worked on by one or more workers.
Ditch cron for Clockwork. Cron jobs are spawned by the OS and execute with their own execution environment and configuration. We are always better off at least running a Clockwork process instead that can reuse our Rails environment and configuration natively.
Process Managers are our Friends. Responsible for spawning, monitoring, and keeping alive multiple processes that make up the entire application stack, process managers such as foreman, hivemind, or overmind are the ideal companions for containerized Rails applications. With orchestration platforms like Heroku, Deis or Dokku offering native Procfile support and fine-grained application scaling per process executing scheduler processes alongside our worker and web processes make total sense.

How can we help?

If you have any questions about KUY.io or are looking for advice with implementing background job processing workloads in your next project, please feel free to reach out us!

Nicolas Bettenburg, CEO and Founder of KUY.io

Executing Periodic Background Jobs In Dockerized Rails Applications

Introduction

Table of Contents

Sidekiq

How to use it with containerized Rails apps

Resque

How to use it with containerized Rails apps

GoodJob

How to use it with containerized Rails apps

DelayedJob

Summary

How can we help?