Mysql Replication Adapter bug with Rails query cache

2009 January 13

[Note: this post relates to our using Rails 2.0.2. The fix provided below may not work in other versions of Rails.]
We recently started using the Mysql Replication Adapter from the RapLeaf guys to enable us to off-load some queries to our mysql slaves.

The adapter is easy to install, configure and use, which is why we chose it over other possibilities.

However, today we found a little bug, and we traced it to the interaction of the Rails 2 query cache with the mysql_replication adapter. The short version is that when using the mysql_replication adapter, the query cache doesn’t get purged even if within your Rails action you perform an insert/update/delete operation. The normal adapter has the behavior that any operation to modify data clears the query cache. The bug we were seeing looked something like this:


  items = user.items
  # items is empty because user is new
  user.items << Item.new(...)
  ...
  items = user.items(true)
  ## ACK!! items is STILL empty. We should see that item we just created.

So even though we were passing force_reload=true to our association finder, the query was still hitting the query cache which had previously returned an empty set.

The mysterious alias_method_chain

Turns out that the problem is the result of different approaches to class extension used by Rails core versus the Rapleaf adapter. Rails relies heavily on the alias_method_chain approach. This approach basically replaces method A in a class with a new method B, but renames the old method and calls the new method ‘A’. Then the new method is responsible for invoking the old method. The result is that now when I call method ‘A’ on the class, it invokes the new method first, then the old method.

Rails uses this technique to implement the query cache. In particular, it method-chains each of the “insert”, “update”, and “delete” methods on the MysqlAdapter class to call a method to purge the query cache.

Now the RapLeaf code uses what I consider a more “traditional” approach (or maybe you’d call it a Java-like approach) to class extension, which is it subclasses the MysqlAdapter class. In the subclass, a number of methods are overridden. In particular, the same “insert”, “update”, and “delete” methods are overridden.

So what happens when you call MysqlReplicationAdapter.insert ? Turns out you just call the specialized version of the method, and the query-cache method chaining stuff in the parent class gets ignored. Basically the two versions of class extension aren’t cooperating.

Now the proper fix is probably to re-code the replication adapter to use method chaining instead of subclassing. But that’s not a trivial task! Fortunately, I was able to more easily fix the problem by just replacing the “insert/update/delete” methods of the replication adapter with these methods:


      def insert_sql(sql, name = nil, pk = nil, id_value = nil, sequence_name = nil) #:nodoc:
        ensure_master
        super sql, name
        id_value || @connection.insert_id
      end

      def update_sql(sql, name = nil) #:nodoc:
        ensure_master
        super
        @connection.affected_rows
      end

      def delete_sql(sql, name = nil)
        update_sql(sql, name)
      end

These XX_sql methods are sort of the internal versions of the public facing methods. Fortunately its the public-facing ones that query_cache method chains, and so these internal ones can be safely overridden.

Better configuration for your Rails app survives reload!

2009 January 9
by scottp

Historically we’ve added ad-hoc config for our rails app by jamming simple Ruby constants in our environment.rb files. Like this:


RECAPTCHA_PUBLIC_KEY = 'foobar'

Ick! We’ve quickly grown to have more than ten of these, and they are really yucky. So recently we’ve tried cleaning these up by adding our own simple configuration class:


module Remixation
  class Configuration
    cattr_accessor :recaptcha_key
  end
end

Ah…nice. Now we can change our config files to:


Remixation::Configuration.recaptcha_key = 'foobar';

this seems cleaner. Easier to manage our various config values since they will be defined in our class. It also means we can update values without generating the dreaded Ruby “redefined constant” warning.

Initially I had put this class under lib, but we quickly ran into a problem. Every time the app reloaded (which in dev is every request) our values would get wiped out. This is because the class def gets reloaded, but our environment statements to set the values don’t get re-run.

Our eventual solution involved moving our class into RAILS_ROOT/config/remixation, then adding these lines to our environment.rb config block:


config.load_paths += %W( #{RAILS_ROOT}/config/remixation )
config.load_once_paths += %W( #{RAILS_ROOT}/config/remixation )

These tell Rails (v2.1) NOT to reload classes from this new directory. I’d be happy to hear if people have found more elegant ways of handling their app config.

Rails – your pages are still slow! (part 2)

2008 May 21
by scottp

Here at Vodpod we like to think that we’re clever, but that doesn’t mean we’re smart!

Back in January we posted about how to modify Rails to fix a problem with asset tags. The problem was that asset tags coming from different servers would get different ‘asset codes’ appended to them, and thus look like distinct files to the browser. We solved that problem so that the browser sees a consistent URL for files that are the same. I even set myself up with this nice claim at the end:

“In between deploys the browser can happily cache our assets.”

Ah, how misguided we were back four months ago. The problem is that our static files didn’t have the proper cache expiry headers to allow the browser to cache them. Within a session, if you’re lucky, your browser may cache a static file or two. But without explicitly telling the browser that the file doesn’t need to be refetched any time soon, it will pretty quickly go back and ask for that file again.

The solution is simple, you need to add the ‘Expires’ and ‘Cache-Control’ headers to your static resources. These tell the browser that your resource will not change for X amount of time, and thus the browser can safely cache it until then. Now since Rails is gonna change the cache-buster code every time we deploy, this expiry time can in effect be infinite.

Now this seems like the kind of thing you would hope that Rails would give you out-of-the-box. The problem is that Rails doesn’t expect to serve static files, that is supposed to be done by your web server. We use Apache, and Apache makes it very easy to add these headers. But it doesn’t do it automagically, you gotta configure it.

Step 1. Make sure you’ve got mod_expires available. I had to compile the so, and configure httpd.conf:

LoadModule expires_module modules/mod_expires.so

Step 2. Tell Apache to add the expiry headers

ExpiresByType image/png “access plus 1 month”
ExpiresByType image/gif “access plus 1 month”
ExpiresByType text/css “access plus 1 month”
ExpiresByType application/x-javascript “access plus 1 month”

You may have to play with the mime types depending on your setup. These directives are for Apache 2.2.x. Check the manual if you’re running something different.

Nice and speedy. This only affects load times for repeat visitors, but assuming you’ve got a lot of those, this should reduce overall connections quite a bit. I strongly recommend people read through the official documentation of the cache control headers: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.
It’s lovely nighttime reading!

Loading a Rails session by hand – Flash uploads

2008 March 18
by scottp

Here is some code I worked out a while ago for loading a Rails session manually inside an action.

This is useful for example when doing uploads from Flash, since the Flash runtime does not pass on the Rails session id cookie properly. So instead you have to pass the session id as a normal parameter, and load the session by hand.


opts = {:session_key => 'session', :session_id => params[:session]}
opts = opts.merge(request.class.const_get('DEFAULT_SESSION_OPTIONS'))
sess_opts = opts.inject({}) { |options, (k,v)| options[k.to_s] = v; options }
real_session = CGI::Session.new(request, sess_opts)

Update

Here’s another approach on solving this problem by patching CGI::Session:

http://blog.inquirylabs.com/2006/12/09/getting-the-_session_id-from-swfupload/

Rails – why your pages load so slowly!

2008 January 16
by scottp

A while back we spent some time optimizing the load speed for our main video page. One thing we noticed was that Rails has this habit of tacking on a “cache buster” integer to the end of static asset paths when you use one of the asset tag helpers like “javascript_include_tag”. The problem was that the cache buster integer changed as we visited the same page.

Well if you go look at the source code, the reason is clear:


      asset_id = rails_asset_id(source)
      source << '?' + asset_id


      def rails_asset_id(source)
          ENV["RAILS_ASSET_ID"] ||
            File.mtime("#{RAILS_ROOT}/public/#{source}").to_i.to_s rescue ""
        end

So in its wisdom, Rails uses the mtime of the file to generate the ‘asset tag id’. This has the nice effect that whenever the file changes, then you get a new id for the file and the browser knows it needs to reload it. The problem is that we run a cluster of web servers, so each server generates
a different id. So the same file actually appears as different files from each server.

The net result is to kill page load times, especially if your page has multiple Javascript files. The browser keeps seeing a different file, so it can’t just use the one it has cached.

Fortunately Rails gives you an out, you can set the asset ID yourself using the RAILS_ASSET_ID constant. But how do we set this value? Well, we want it to change whenever the file changes. A good proxy is that latest SVN revision.

So what we did is wrote a rake task to dump the SVN revision to a file:


  desc "Writes latest svn update number to config/svn_version for use as asset tag id"
  task(:svn_version => :environment) do
    lines = `svn log -r HEAD`
    if lines =~ /(r\d+)/
      f = File.open("config/svn_version", "w")
      f.write($1)
      f.close
    end
  end

Now we just add some code to environment.rb to read this file on startup into our constant:


# Setup the ENV["RAILS_ASSET_ID"] so that our resources look the same on every machine. This
# assumes that rake remix:svn_version has been run on each machine
if File.exist?("config/svn_version")
File.open("config/svn_version", "r") {|f| ENV["RAILS_ASSET_ID"] = f.readline }
end

Finally, we added a call to our remix:svn_version task to our deploy scripts. Now whenever we deploy, we rev the asset id and the browser knows to reload files that (may) have changed. In between deploys the browser can happily cache our assets.

Blog Importer

2008 January 10
by scottp

There’s a new feature on Vodpod that allows you to keep your Vodpod account in sync with your blog. Simply click on “Add a video to this pod” from your pod’s home page, then fill in the field that asks for your blog’s web address:

Blog Importer Screenshot

After that any videos that you post to the specified blog will be imported into your pod. To turn off the importing, you can go into your pod settings and delete the pod from the list of imports. Blog importing will work with any blog that advertises an RSS feed, which should be most of them.

Rails and Memcache timeouts

2007 December 14
by scottp

Lately we’ve been having trouble with our rails processes hanging when talking to memcached. The symptom was Mongrel killing our request after 60 secs, and the stack trace would show we were hung up trying to talk to memcache.

This seems to only happen when our db is slow in responding. Seems like perhaps dead sockets to memcache were not getting cleaned up.

Anyway, I found the Rapleaf guys had seemed to see similar trouble. They described adding timeouts to the memcache-client to fix the problem.

So we tried something similar and it did in fact seem to help. I have attached the modified memcache-client code to this post. The modifications are mostly at the top. I added a facade TCPTimeoutSocket class which wraps the normal TCPSocket class with timeouts.

Update 2009: My code has by now made it into the Rails core memcache client. It’s standard enough now that people complain about its performance overhead.

Using Amazon S3 with the rails file_column plugin

2007 August 17
by scottp

A while back we hacked the Rails file_column plugin to support storing files up at Amazon S3.

Seems like file_column isn’t much used these days, so I’m just offering up our code as is.

You will need to:

  • Define the constants AWS_S3_KEY and AWS_S3_SECRET
  • Also have the AWS::S3 plugin for accessing Amazon
  • Include the option :use_s3 => true in your file_column declarations
  • Drop in our hacked file_column.rb from the bottom of this post

Getting the S3 URL back out is not automatic. We added this common method to ActiveRecord::Base:


  def s3_bucket_url(attr)
    name = "http://s3.amazonaws.com/vodpod.com.#{self.class.to_s.tableize}.#{attr.to_s}"
    name << '.dev' if RAILS_ENV=='development'
    name
  end

and implement model-specific methods for getting to the file_column attribute, like this one:

  def thumbnail_url(size = nil)
    if !size
      return "#{self.s3_bucket_url(:thumbnail)}/#{self.id}.jpg"
    else
      return "#{self.s3_bucket_url(:thumbnail)}/#{self.id}.#{size.to_s}.jpg"
    end
  end

file_column.rb

Background Processing in Rails

2007 August 17
by scottp

Probably the most important thing I’ve learned about using Mongrel is don’t be slow! Actions that take a long time (like greater than 5 seconds) will kill throughput since all other Rails actions will be queued up behind that slow one.

So the general advice is to perform long running tasks “in the background”. Ok, fine, we’ve done lots of that. But sometimes you have a task that essentially needs to be synchronous for your user, even though it takes a long time. In our case, whenever someone uploads a video to Vodpod, we want them to actually wait while we process the video so they can choose their favorite thumbnail. Now, many people suggest using BackgroundDrb for this, but that thing seems like overkill. It creates like 3 daemon processes and requires druby for communication. I wanted something simpler that would just use the db for communication.

So what we implemented is what I call a “pseudo-synchronous” tasks. The basic flow is pretty easy:

-User makes initial request
-server creates a “background job” and stuffs it in the db
-request returns

-User goes to “progress” page, periodic Ajax call checks progress
-server checks progress of background job in the db
-when job is done, then page does Ajax call to show the completed data

So we run the job asychronously to the mongrel processing, but use Ajax to indicate progress to the user.

Now here’s the trick. Rather than having a separate process to run the background job, we use fork to clone our mongrel process and have the child process run the background task. The advantages of forking include:

  • fast – fork happens very quickly at the OS level. there’s no app startup time
  • easy – All our current state is preserved, so we don’t need to pass arguments to some script
  • local – we know the child process runs on the same machine, so if we need access to some local resource, like a file, we know it will be there. In a clustered environment it can be tricky to make sure that background processing has access to particular resources

There’s one big disadvantage to using fork – the child process basically wrecks our ActiveRecord database connection. AR stores the database connection in a static variable, and so the child process re-uses that connection. This causes problems since AR is not designed to have multiple processes using the same connection.

To get around the ActiveRecord problem, we have to have the child process create it’s own db connection, and we have to have the parent process close and re-open it’s connection. Altogether the code for forking the child and managing the connections looks like this:


class BackgroundJob < ActiveRecord::Base
  # Spawn a new background process to execute this background job immediately. We have
  # to muck with re-creating our ActiveRecord connections because AR doesn't normally survive fork.
  # I wonder what else craps out...
  def spawn
    self.reload
    if self.status == nil || self.status == STATUS_NEW
      dbconfig = ActiveRecord::Base.remove_connection
      pid = fork do
        begin
          # Monkey-patch Mongrel to not remove its pid file in the child
          require 'mongrel'
          Mongrel::Configurator.class_eval("def remove_pid_file; puts 'child no-op'; end")
          ActiveRecord::Base.establish_connection(dbconfig)
          run
        ensure
          ActiveRecord::Base.remove_connection
        end
      end
      Process.detach(pid)
      ActiveRecord::Base.establish_connection(dbconfig)
    end
  end

end

I’ve coded the fork call onto the BackgroundJob model class. This makes it super easy to create the background job and run it. Now my initial action looks like this:


  background_job = BackgroundJob.create(Video, 'static_process_file', @video.id, ftp_file_name, current_user.id)
  background_job.spawn

The method that the child will call is the “static_process_file” method on the Video model class. Note that I don’t actually have to use a static method at all, I could actually pass in a Proc or an object and a method to run. This makes it really easy to take some long-running code you’ve got and split it off into the background process.

Now my Ajax-called action is easy:


  def get_job_status
    job = BackgroundJob.find(params[:id])
    render :text => {:status => job.status, :message => job.message}.to_json
  end

When I get job.status == “complete” then I have another Ajax call to retrieve the results of the background job (in my case a set of thumbnails extracted from a video).

I’m not 100% comfortable with the fork due to the problems with the db connection. I’m not sure I would want to run that code really frequently. In my case it only runs perhaps a hundred times per day. Nonetheless, we have this code running in production and I haven’t seen any problems. If anyone else has a better work-around for the db connection issue I’d love to hear it.

Update

Thanks to some awesome comments, I’ve updated the code to fix two problems. Tom suggested using “Process.detach” to prevent the child process from hanging around as a zombie. I’ve also added a bit of code to monkeypatch Mongrel so that the child process doesn’t remove the parent’s PID file. Obviously you want to remove this code if you’re not using Mongrel.

And an even bigger bonus, Tom created a whole plugin to handle the process forking. So check it out!

Update 2 Ruby-god Tom Anderson has a tricky “exit!” call at the end of his fork handler (in the child). This call ends the process without invoking at_exit handlers, which is what Mongrel uses to remove its PID file. This is probably safer than monkeypatching Mongrel as I’ve done. Not sure if there might be other at_exit handlers you would *want* to run, but given how the child process just has copies of resources from the parent, probably avoiding all handlers is a good idea.

Life without Capistrano

2007 August 17
by scottp

I know it’s heresy in the Rails community, but I don’t like Capistrano.

Despite it’s claims to simplicity, I don’t find it simple at all. It seems to have a rather hardened model of what “deployment” means, and as soon as you want to do something different then it gets really messy really quickly.

One good example: we have our web servers behind a hardware load-balancer. This means that we can easily have one server down for a short time without any noticeable interruption on the site. So the ideal deployment approach for us is a “rolling deployment” – upgrade and restart each server in series. But Capistrano seems totally wedded to parallel deployment. This means that all the hosts get upgraded/restarted at the same time. What happens if something goes wrong (like a svn conflict in a config files – it’s happened!)? Your whole site is hosed because every machine got upgraded at once.

The reality is that the deployment steps are easy. They’re just some simple svn/mongrel admin commands sent over ssh. So we decided to roll our own Rake tasks to handle deployment. Basically these tasks just exec “ssh” with the commands we want to run. (I’ve attached our rake script at the bottom of this post. You’ll want to edit it for your needs, but it’s got some useful bits like running a remote “sudo” command and checking for mongrel file uploads.)

Now when I want to update, I just run:

rake remix:update_all_webs

This uses ssh remote commands to send to each server in series:

  • Check that mongrel is not processing any uploads
  • Update svn, check for any conflicts
  • Restart apache and mongrels

Note that these rake tasks assume key-based access to the target machines, so you need to setup authorized keys between the deployment server and your targets. The script does ask for a password, and this password is used to restart apache using the sudo command.

The trickiest part of this script is the remote “sudo” command. There’s no way to provide sudo the password on the command line, so I have to feed it to its stdin. Here’s the ruby snippet:

  def remote_sudo(server, cmd, password)
    puts "[sudo, #{server}] #{cmd}"
    result = ""
    IO.popen("ssh -T #{server}", "r+") do |io|
      io.puts("sudo -S #{cmd}")
      io.puts(password)
      io.puts("sudo -k")
      io.close_write
      result = io.readlines
    end
    result
  end

Resources

remix.rake

Here’s another article on using ssh for remote scripting, although they use shell script instead of ruby:

DeveloperWorks article on SSH.