How to Use Postgres to Calculate and Save Rankings

Standard
Reading Time: 2 minutes

When you need to calculate rank and save it to a field in Postgres. This example uses Rails ActiveRecord but if you can execute the query you are good.

A naive way would be to build an iterative method with an index. An example in Ruby:

Product.order(price: :desc).each_with_index do |product, index|
  product.price_rank = index + 1
  product.save
end

This might work great if you have a limit(10) or have a small data sample. But what if you have 100,000 records or even worse 1 million records.

For 100 items the above code took 0.181s.
For 1,000 items .49s.
For 100,000 items 155.04s that’s roughly 2 minutes 36 seconds.

Now using Postgres I know it’ll be faster but how much?

query = "UPDATE products SET price_rank = r.rnk FROM (SELECT id, RANK() OVER (ORDER BY price DESC) as rnk FROM products) r WHERE products.id = r.id"

ActiveRecord::Base.connection.execute(query)

I am getting run times of .0005 because I am dumping the work onto the database.

Running EXPLAIN ANALYZE on the query.

 Update on products  (cost=16813.40..25200.70 rows=101002 width=112) (actual time=15247.855..15247.855 rows=0 loops=1)
   ->  Hash Join  (cost=16813.40..25200.70 rows=101002 width=112) (actual time=8693.946..12299.080 rows=101002 loops=1)
         Hash Cond: (products.id = r.id)
         ->  Seq Scan on products  (cost=0.00..3391.02 rows=101002 width=68) (actual time=0.584..990.371 rows=101002 loops=1)
         ->  Hash  (cost=14563.87..14563.87 rows=101002 width=56) (actual time=8693.010..8693.010 rows=101002 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 2685kB
               ->  Subquery Scan on r  (cost=11786.32..14563.87 rows=101002 width=56) (actual time=2278.120..7232.235 rows=101002 loops=1)
                     ->  WindowAgg  (cost=11786.32..13553.85 rows=101002 width=24) (actual time=2278.083..5283.472 rows=101002 loops=1)
                           ->  Sort  (cost=11786.32..12038.82 rows=101002 width=16) (actual time=2278.053..3262.086 rows=101002 loops=1)
                                 Sort Key: products_1.price DESC
                                 Sort Method: external merge  Disk: 2576kB
                                 ->  Seq Scan on products products_1  (cost=0.00..3391.02 rows=101002 width=16) (actual time=0.639..992.346 rows=101002 loops=1)
 Planning time: 3.271 ms
 Execution time: 15252.513 ms 

Over 100,000 records ranked and updated with values in 15s. Not bad.

You also get the added logic of a true ranking when there is a tie like #1, #2, #2, #2, #5, #6, etc. The calculated version was simple and did not account for it. If you do want to, you can implement it but with Ruby, you’ll add more bloat and complexity.

Is Rails Scalable?

Standard
Reading Time: 7 minutes

Being an optimist I would say, Yes! Having worked on a large scale high visibility site I’ve personally seen it happen, so definitely Yes! When the budget was not a limiting factor, I saw vertical and horizontal scaling at it’s best. Get the biggest badest app server, then multiply that by 20 with multiple load balancers. CDN cache everything possible. Scaling solved. Millions of views, thousands of transactions per second no problem. The databases were actually starting to buckle from all the connections. Enter pgBouncer connection pooling and more load balancers, but that is the point. You can scale to the point where it isn’t a problem with code but infrastructure.

But what if you can’t. Here are my experiences.

Vertical Scaling

You add more RAM, CPUs, faster pipes (fiber optics directly to another server or network), SSDs. This is all possible, as CPUs reach their limits this gets to be a limiting factor. This has a threshold. With more users, the server can still meltdown. This is not for an exponent growth more of a linear one.

This also holds true for the database server, as most issues are database related. Slow queries can run faster by beefing up the hardware it runs on.

I consider this a quick fix for linear or predictable growth.

Horizontal Scaling

Load Balancing! Get more servers to do the job. Two or twenty is better than one. Share the load between servers. This can and will get costly. Be it cloud or metal or a combination of the two. Horizontal scaling if done correctly can be a really easy solution to scaling issues.

Continue reading