scottp August 17th, 2007
Probably the most important thing I’ve learned about using Mongrel is don’t be slow! Actions that take a long time (like greater than 5 seconds) will kill throughput since all other Rails actions will be queued up behind that slow one.
So the general advice is to perform long running tasks “in the background”. Ok, fine, we’ve done lots of that. But sometimes you have a task that essentially needs to be synchronous for your user, even though it takes a long time. In our case, whenever someone uploads a video to Vodpod, we want them to actually wait while we process the video so they can choose their favorite thumbnail. Now, many people suggest using BackgroundDrb for this, but that thing seems like overkill. It creates like 3 daemon processes and requires druby for communication. I wanted something simpler that would just use the db for communication.
So what we implemented is what I call a “pseudo-synchronous” tasks. The basic flow is pretty easy:
-User makes initial request
-server creates a “background job” and stuffs it in the db
-request returns
< --server runs background job -->
-User goes to “progress” page, periodic Ajax call checks progress
-server checks progress of background job in the db
-when job is done, then page does Ajax call to show the completed data
So we run the job asychronously to the mongrel processing, but use Ajax to indicate progress to the user.
Now here’s the trick. Rather than having a separate process to run the background job, we use fork to clone our mongrel process and have the child process run the background task. The advantages of forking include:
- fast - fork happens very quickly at the OS level. there’s no app startup time
- easy - All our current state is preserved, so we don’t need to pass arguments to some script
- local - we know the child process runs on the same machine, so if we need access to some local resource, like a file, we know it will be there. In a clustered environment it can be tricky to make sure that background processing has access to particular resources
There’s one big disadvantage to using fork - the child process basically wrecks our ActiveRecord database connection. AR stores the database connection in a static variable, and so the child process re-uses that connection. This causes problems since AR is not designed to have multiple processes using the same connection.
To get around the ActiveRecord problem, we have to have the child process create it’s own db connection, and we have to have the parent process close and re-open it’s connection. Altogether the code for forking the child and managing the connections looks like this:
class BackgroundJob < ActiveRecord::Base
# Spawn a new background process to execute this background job immediately. We have
# to muck with re-creating our ActiveRecord connections because AR doesn't normally survive fork.
# I wonder what else craps out...
def spawn
self.reload
if self.status == nil || self.status == STATUS_NEW
dbconfig = ActiveRecord::Base.remove_connection
pid = fork do
begin
# Monkey-patch Mongrel to not remove its pid file in the child
require 'mongrel'
Mongrel::Configurator.class_eval("def remove_pid_file; puts 'child no-op'; end")
ActiveRecord::Base.establish_connection(dbconfig)
run
ensure
ActiveRecord::Base.remove_connection
end
end
Process.detach(pid)
ActiveRecord::Base.establish_connection(dbconfig)
end
end
end
I’ve coded the fork call onto the BackgroundJob model class. This makes it super easy to create the background job and run it. Now my initial action looks like this:
background_job = BackgroundJob.create(Video, ’static_process_file’, @video.id, ftp_file_name, current_user.id)
background_job.spawn
The method that the child will call is the “static_process_file” method on the Video model class. Note that I don’t actually have to use a static method at all, I could actually pass in a Proc or an object and a method to run. This makes it really easy to take some long-running code you’ve got and split it off into the background process.
Now my Ajax-called action is easy:
def get_job_status
job = BackgroundJob.find(params[:id])
render :text => {:status => job.status, :message => job.message}.to_json
end
When I get job.status == “complete” then I have another Ajax call to retrieve the results of the background job (in my case a set of thumbnails extracted from a video).
I’m not 100% comfortable with the fork due to the problems with the db connection. I’m not sure I would want to run that code really frequently. In my case it only runs perhaps a hundred times per day. Nonetheless, we have this code running in production and I haven’t seen any problems. If anyone else has a better work-around for the db connection issue I’d love to hear it.
Update
Thanks to some awesome comments, I’ve updated the code to fix two problems. Tom suggested using “Process.detach” to prevent the child process from hanging around as a zombie. I’ve also added a bit of code to monkeypatch Mongrel so that the child process doesn’t remove the parent’s PID file. Obviously you want to remove this code if you’re not using Mongrel.
And an even bigger bonus, Tom created a whole plugin to handle the process forking. So check it out!
Update 2 Ruby-god Tom Anderson has a tricky “exit!” call at the end of his fork handler (in the child). This call ends the process without invoking at_exit handlers, which is what Mongrel uses to remove its PID file. This is probably safer than monkeypatching Mongrel as I’ve done. Not sure if there might be other at_exit handlers you would *want* to run, but given how the child process just has copies of resources from the parent, probably avoiding all handlers is a good idea.
Resources
Here are the files for the BackgroundJob model class and migration. Edit to fit your needs.
100_create_background_jobs.rb
background_job.rb