Hacking Solr with Rails - part 2
scottp March 9th, 2009
In our last exciting episode, we talked about getting Solr running with Rails. My specific requirement was to generate a list of videos related to another video.
Fortunately, Solr makes this super easy with its MoreLikeThis handler. Building on a class from the Lucene library, that handler makes it really easy to say, “given doc X in my index, return my other docs like that one”. It does this by building a query based on your subject doc. The query it builds is smart about obeying the tf/idf of each term to get you good results.
However, the MoreLkeThis handler doesn’t give you any control over the ordering of your results - you just get relevance order. However, I wanted to boost more recent videos over older ones. I didn’t want to sort my results by date, because that potentially puts bad matches at the top. Instead I just wanted to boost more recent results so they generally appeared towards the top.
Now Solr includes a general search handler called the DisMaxRequestHandler which supports this cool bf argument, by which you can provide a boost function. Essentially the function will get executed for every document and its result multiplied against the relevance score of the doc to arrive at its overall score. This works really nicely, and there’s even an example in the documentation showing how to boost by recency.
Unfortunately, the MoreLikeThis handler doesn’t have that boost function support. So I decided to hack it in. Very simply, I just copied over the code from the DisMaxQueryHandler that handles the “bf” argument into the MoreLikeThisHandler. Under the covers, both handlers build a Lucene query to apply to your index, so the mod to MoreLikeThis just required building the query parts from the boost function and adding them to the “more like this” query. Then I just had to play around somewhat with my boost function to get results in the order I liked them.
I don’t pretend to have made this mod in the cleanest or best way possible. But it is nice that Solr/Lucene allow this kind of hacking, even though it’s a lot harder than hacking something like Rails! I’ve attached my modified MoreLikeThisHandler.java file to this post just in case it comes in handy for anyone else.
