Wow! Real-time Recommendations using Sunspot

Boosting your search to apply collaborative filter with Solr

Posted on September 07, 2015

A good search is essential to make users to find things. However, recommendations are essential to help them to find what they want. "A lot of times, people don't know what they want until you show it to them." (Steve Jobs)

There are good recommenders built on top of Apache Mahout or Spark. Of course, any canned solution must be adapted to your business. It's a noble task normally done by brave data scientists.

Suppose you are not a data scientist or can't hire one. Let's also say you want to offer recommendations just to see what happen, and then, learn with them. Here come some tips and a promising way to do it faster with Sunspot !

These tips are based on an awesome talk gave by Trey Grainger at Lucene Revolution 2012 about using Solr as a Real-time recommender engine.

Getting and indexing user tastes

We going to cover collaborative filter, a recommendation technique that uses people tastes (likes). Basically, the idea is to find people with similar tastes. Then, recommend items these people like and the target user didn't try yet. For this post, let's recommend some songs \o/

First, we have to extract user tastes (listened songs). Then, for each song of our catalogue, we index who listened the song. Let's assume we've indexed all songs through the Song model. In this case, we only have to add the user tastes:

class Song < ActiveRecord::Base
  searchable do
    integer :played_for, using: :user_tastes, multiple: true
  end
end

Here, :user_tastes is an auxiliary method to return from logs, database, or tracking system, an array of user ids to be indexed.

Finding Similar Users

With all user tastes indexed, let's recommended songs for user 42, named Doni. First, we're going to find his most similar users:

def most_similar_users
  similar_users = Song.search do
    with(:id, my_tastes)
    facet(:played_for)
  end

  similar_users.facet(:played_for).rows[0...COUNT].map do |f|
    {user_id: f.value, similarity: f.count}
  end
end

Basically, we are implementing the following steps:

  1. with(:id, my_tastes) filters by Doni tastes (played songs)
  2. facet(:played_for) groups by everybody who played same songs Doni played

Here, Solr sorts groups (facets) by group size and we are getting first COUNT similar users. For each facet, f.value is a user id and f.count counts the number of songs in common with Doni. As a result, we have something like this:

{user_id: 89, similarity: 35},
{user_id: 12, similarity: 26},
{user_id: 190, similarity: 11},
...

Recommending items through Collaborative filter

This is the core of our recommendation system! Based on the similarities, we going to search for songs filtering by similar users on played_for field, and boosting by the similarity level:

Song.search do
  adjust_solr_params do |params|
    params[:q] = similarity_query(similar_users)
  end
end

def similarity_query(similar_users)
  similar_users.map{|k| "#{k[:user_id]}^#{k[:similarity]}"}.join(" OR ")
  "played_for_im:(#{users})"
end

Through adjust_solr_params, we build the following Solr query:

solr/select?q=played_for_im:(89^35 OR 12^26 OR 190^11)

With this query, we filter by results to only get songs played by users who has tastes similar to Doni's. Also, we boost (add more weight) to who has more songs in common.

One advantage of recommending using a search engine is to get real-time recommendations and filter them. If we want to recommend only country music (Doni's favorite), we just have to include filters.

When including a filter by genre to find similar users, we only have similar users with country music taste. However, when including on the second query, we find users with general tastes, and then, get country songs they played and Doni didn't play yet. The decision is dependent of dataset size and the recall and precision of search results.

Mostly recommenders use more accurate metrics to get user similarity (e.g., euclidean distance). The idea here is to have something running quickly, using what you already have: a search engine! I hope to bring you good insights!