How to Use Postgres to Calculate and Save Rankings

Standard
Reading Time: 2 minutes

When you need to calculate rank and save it to a field in Postgres. This example uses Rails ActiveRecord but if you can execute the query you are good.

A naive way would be to build an iterative method with an index. An example in Ruby:

Product.order(price: :desc).each_with_index do |product, index|
  product.price_rank = index + 1
  product.save
end

This might work great if you have a limit(10) or have a small data sample. But what if you have 100,000 records or even worse 1 million records.

For 100 items the above code took 0.181s.
For 1,000 items .49s.
For 100,000 items 155.04s that’s roughly 2 minutes 36 seconds.

Now using Postgres I know it’ll be faster but how much?

query = "UPDATE products SET price_rank = r.rnk FROM (SELECT id, RANK() OVER (ORDER BY price DESC) as rnk FROM products) r WHERE products.id = r.id"

ActiveRecord::Base.connection.execute(query)

I am getting run times of .0005 because I am dumping the work onto the database.

Running EXPLAIN ANALYZE on the query.

 Update on products  (cost=16813.40..25200.70 rows=101002 width=112) (actual time=15247.855..15247.855 rows=0 loops=1)
   ->  Hash Join  (cost=16813.40..25200.70 rows=101002 width=112) (actual time=8693.946..12299.080 rows=101002 loops=1)
         Hash Cond: (products.id = r.id)
         ->  Seq Scan on products  (cost=0.00..3391.02 rows=101002 width=68) (actual time=0.584..990.371 rows=101002 loops=1)
         ->  Hash  (cost=14563.87..14563.87 rows=101002 width=56) (actual time=8693.010..8693.010 rows=101002 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 2685kB
               ->  Subquery Scan on r  (cost=11786.32..14563.87 rows=101002 width=56) (actual time=2278.120..7232.235 rows=101002 loops=1)
                     ->  WindowAgg  (cost=11786.32..13553.85 rows=101002 width=24) (actual time=2278.083..5283.472 rows=101002 loops=1)
                           ->  Sort  (cost=11786.32..12038.82 rows=101002 width=16) (actual time=2278.053..3262.086 rows=101002 loops=1)
                                 Sort Key: products_1.price DESC
                                 Sort Method: external merge  Disk: 2576kB
                                 ->  Seq Scan on products products_1  (cost=0.00..3391.02 rows=101002 width=16) (actual time=0.639..992.346 rows=101002 loops=1)
 Planning time: 3.271 ms
 Execution time: 15252.513 ms 

Over 100,000 records ranked and updated with values in 15s. Not bad.

You also get the added logic of a true ranking when there is a tie like #1, #2, #2, #2, #5, #6, etc. The calculated version was simple and did not account for it. If you do want to, you can implement it but with Ruby, you’ll add more bloat and complexity.