Rankum

Letting search ranking close to perfection

Posted on May 11, 2016

When tunning search ranking, possibilities may be infinite. Often, we consider rank results perfect because most popular items are always on first positions.

Great! right ? Not so much if our rank function is not good enough to consider fresh items (for example). So, how to assure that rank changes include different item types like new, old, most popular or most expensive items ?

Rankum was build to help to deal with these challenges. With this tool, we can apply metrics that measures how similar our search rank are from a ''perfect search rank''.

What is a perfect rank ?

A perfect rank is a representation of an ideal rank. Imagine if you could manually to select the position of each item in search results (ex: A, B, C, E, F). Having this representation, we tune our search to be as close as possible to the perfect ranking (ex: B, A, C, E, F).

In this simple example, the rank is not the perfect rank but, it is similar. Rankum provides features to define representations of real search ranks and to apply different metrics to measure similarities!

Getting started

Lets get started installing Rankum:

$ gem install rankum 
Then, we can run through command-line (stand-alone mode) or directly into your ruby code.

Stand-alone mode

$ rankum -m FCP -r RankFileReader -p perfect_rank.txt -a my_rank.txt

  • metric (-m or --metric)
    Which metric is used to calculate rank similarity such as FCP (Fraction of Concordant Pairs)
  • rank reader strategy (-r or --rank_reader)
    How to read ranks (from a file, from Solr or Elastic Search). With ReadFileReader, each file line is an search result item identifier. You can get an example here.
  • perfect rank file (-p or --perfect_rank_file_path)
    Where to find perfect rank file. It is only applicable for RankFileReader strategy
  • actual rank file (-a or --actual_rank_file_path)
    Where to find actual rank file. It is only applicable for RankFileReader strategy

Similarity Metrics

To calculate rank similarity, Rankum extracts the following metrics:

FCP (Fraction of Concordant Pairs)

Basically, this metric counts how many pairs occuring on perfect rank are also on actual rank:

In this example, we can notice (orange line) that item A dropped 2 positions. As a consequence, AB and AC (red) are not on actual rank. So, FCP is:

$$FCP(actual\_rank) = \frac{\#matched\_pairs}{\#total\_pairs} = \frac{8}{10} = 0.8$$

Here, #matched_pairs are the number of green pairs, those pairs that are on both ranks. And, #total_pairs are the number or perfect rank pairs. Red pairs should be on actual rank to increase similarity.

FCP punishes more changes in first positions, if D was in a wrong position, FCP would be much higher (0.9). This is what we seek when tunning search engines. We want best items on top-positions since most of users won't check many search pages.

Rankum also supports ''category items'' rather than an item id. So, you can defined a perfect rank to be (premium, premium, new, new, regular). In this case, you want premium items in the first 2 positions but doesn't matter what are their id but their types!

Rankum 1.0.0 only provides FCP, other metrics such as Mean Reciprocal Rank (MRR) and Discount Cumulative Gain will be added soon. Some metrics, in the context of evaluating recommendations, are explained in this great MOOC: Introduction to Recommender Systems from University of Minnesota.

Rank Readers

For while, Rankum only contains a reader to extract rank items from a text file (RankFileReader). However, Rankum is flexible to support other strategies as well. Basically, a reader strategy must contain only two methods: perfect_rank and actual_rank. Both methods must return an Array. An example can be checked here.

Considerations

Rankum was build based on problems with rank tunning. Reaching a good similarity with a perfect (model) rank, gives me confidence to put rank changes into production. Before Rankum I sometimes felt alone in the dark =(

My plans is to keep developing Rankum to let it more easily to use. Also, I want to add more rank readers and metrics. Rankum is an Open Source software under MIT license. Its repository can be found here. Any collaboration and feedback are very welcome!

Keep on searching