Launching an autocomplete with Sunspot

End-to-end autocomplete using Solr NGram filter

Posted on June 28, 2015

An autocomplete is an awesome feature to improve a search conversion. A good autocomplete will not only provide terms that bring results, but high quality search results (e.g., most popular searches).

Sunspot doesn't bring an autocomplete feature out of the box. However, it's very easy to change the default Solr schema and create an autocomplete based on full-text search.

Adding NGram filter to schema.xml

For default, Sunspot doesn't bring any Solr NGram Filter in schema.xml. This filter is the base of an autocomplete using Solr. Basically, it breaks down, at indexing time, words into grams which are subterms of a word.

There are many NGgram filters in Solr. Here, we use EdgeNGramFilterFactory to break down words from left to right. The word Brazilian gets broken down into the following terms:

Brazilian => Bra Braz Brazi Brazil Brazili Brazilia Brazilian

To enable this filter, we have to define a new Solr type to support indexing and searching in our grams:

<fieldType name="txt_ngram" class="solr.TextField" omitNorms="false">
  <analyzer type = "index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="30" side="front">
  </analyzer>
  <analyzer type = "query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory">
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

We've set solr.EdgeNGramFilterFactory to have a minimum gram size of 3 and maximum of 30. Moreover, we've set this filter to generate grams from left to right ( side="front" ). Now, we only have to create a field that uses this new Solr type:

<dynamicField name="*_ac" type="txt_ngram" indexed="true" multiValued="true" stored="false"/>
<dynamicField name="*_acs" type="txt_ngram" indexed="true" multiValued="true" stored="true"/>

Both fields are similar to other Sunspot dynamic fields that are present at schema.xml.

Letting models searchable

Now that we have set up our Solr to support ngrams, we have to chose which model attribute will be granularized. If you are already using Sunspot, you can chose any attribute of your searchable models. Here, we create a new model to represent a term suggested by our autocomplete feature:

class SearchTerm < ActiveRecord::Base

  searchable do
    text :term, :as => :term_ac
    string :term, stored: true
    integer :page_views
  end

end

Notice that we are using the Solr field we've just created. Also, we added an attribute that counts page views of that term. As I mentioned before, a good autocomplete suggests good terms. In this case, we are suggesting terms that trigger the most popular searches.

Suggesting words

After creating a searchable model, we can create an auxiliary class to perform a full text search:

SearchTerm.search do
  fulltext query
  order_by(:page_views, :desc)
end

Through this code, we receive a string (query) and search for documents in Sunspot. When entering ''bra'', our autocomplete returns a SearchTerm instance where terms could be: ''Brazil'', ''Brasilia'', ''São Paulo, Brazil''.

How do we suggest ''São Paulo, Brazil'' ? Well, when we indexed this term, txt_ngram filter broke down this term in two tokens: São Paulo and Brazil. So, our search, matched the second term.

Now, you just have to create a controller that render a json response with all suggestions. However, your work doesn't stop here =[

Tips and thoughts

Depending of audience, cache and infrastructure, maintaining an autocomplete can demand many human resources. If you don't have those resources, one option is to deploy everything in a Solr PaaS (Platform as a Service) like WebSolr. It's beyond of my scope now but can be the topic of a future post =]

Search is a journey, not a pre-defined destination. It's always possible, and necessary, to improve search features. For an autocomplete, it's necessary to keep your suggestions consistent by removing terms without results and updating term page views daily.

That's it, I hope to have shared relevant thoughts about developing a good autocomplete =] Remember, details can make the difference in the quality of your search. Life is an eternal search, so keep on searching