Optimized top-N-per-group Active Record associations using lateral joins
activerecord-has_some_of_many
This gem adds new optimized Active Record association methods (has_one_of_many
, has_some_of_many
) for “top N” queries to ActiveRecord using JOIN LATERAL
that are eager-loadable (includes(:association)
, preloads(:association)
) to avoid N+1 queries, and is compatible with typical queries and batch methods (find_each
, in_batches
, find_in_batches
). For example, you might have these types of queries in your application:
votes_count
for each postYou can read more about these types of queries on Benito Serna’s “Fetching the top n per group with a lateral join with rails”.
This gem is only compatible with databases that offer LATERAL
joins within Active Record. As far as I’m aware, that is only Postgres.
This gem is not necessary on SQLite, as SQLite will perform lateral-like behavior on join queries by default. MySQL has support for lateral queries, but they are not yet implemented in Active Record.
Really complex queries may not work; please open an issue! This library works by rewriting Active Record queries; it’s possible to create some very complex queries with Active Record. Some things that are specifically known to work:
Add to your gemfile, and run bundle install
:
gem "activerecord-has_some_of_many"
Then you can use has_one_of_many
and has_some_of_many
in your ActiveRecord models to define these associations.
class User < ActiveRecord::Base
has_one_of_many :last_post, -> { order("created_at DESC") }, class_name: "Post"
# You can also use `has_some_of_many` to get the top N records. Be sure to add a limit to the scope.
has_some_of_many :last_five_posts, -> { order("created_at DESC").limit(5) }, class_name: "Post"
# More complex scopes are possible, for example:
has_one_of_many :top_comment, -> { where(published: true).order("votes_count DESC") }, class_name: "Comment"
has_some_of_many :top_ten_comments, -> { where(published: true).order("votes_count DESC").limit(10) }, class_name: "Comment"
end
# And then preload/includes and use them like any other Rails association:
User.where(active: true).includes(:last_post, :last_five_posts, :top_comment).each do |user|
user.last_post
user.last_five_posts
user.top_comment
end
# Add compound indexes to your database to make these queries fast!
add_index :comments, [:post_id, :created_at]
add_index :comments, [:post_id, :votes_count]
Finding the “Top N” is a common problem, that can be easily solved with a JOIN LATERAL
when writing raw SQL queries. Lateral Joins were introduced in Postgres 9.3:
a LATERAL join is like a SQL foreach loop, in which Postgres will iterate over each row in a result set and evaluate a subquery using that row as a parameter. (source)
For example, to find only the one most recent comments for a collection of posts, we might write:
SELECT "comments".*
FROM "posts"
INNER JOIN LATERAL (
SELECT "comments".*
FROM "comments"
WHERE "comments"."post_id" = "posts"."id"
ORDER BY "comments"."created_at" DESC
LIMIT 1
) lateral_table ON TRUE
WHERE "posts"."id" IN (1, 2, 3, 4, 5)
Active Record associations present a bit of a challenge. This is because the association query has WHERE
conditions added after the association scope that allows changing the foreign key, but not the column name. This means that some indirection is necessary in order to make the query work and have the conditions applied to the correct column in a way that the query planner can efficiently understand and optimize:
SELECT "comments".*
FROM (
SELECT
"posts"."id" AS post_id_alias,
"lateral_table".*
FROM "posts"
INNER JOIN LATERAL (
SELECT "comments".*
FROM "comments"
WHERE "comments"."post_id" = "posts"."id"
ORDER BY "comments"."created_at" DESC
LIMIT 1
) lateral_table ON TRUE
) comments
WHERE "comments"."post_id_alias" IN (1, 2, 3, 4, 5)
The resulting optimized EXPLAIN ANALYZE
looks like:
Nested Loop (cost=0.56..41.02 rows=5 width=72) (actual time=0.058..0.082 rows=4 loops=1)
-> Index Only Scan using posts_pkey on posts (cost=0.28..17.46 rows=5 width=8) (actual time=0.022..0.027 rows=4 loops=1)
Index Cond: (id = ANY ('{1,2,3,4,5}'::bigint[]))
Heap Fetches: 0
-> Limit (cost=0.29..4.70 rows=1 width=64) (actual time=0.012..0.013 rows=1 loops=4)
-> Index Scan Backward using index_comments_on_post_id_and_created_at on comments (cost=0.29..44.46 rows=10 width=64) (actual time=0.012..0.012 rows=1 loops=4)
Index Cond: (post_id = posts.id)
Planning Time: 0.200 ms
Execution Time: 0.106 ms
Back in 2018 I (Ben Sheldon) was working to speed up Open311 Status, an uptime and performance monitor for government websites; I was using a Window Function at the time to query the most recent status for each monitored website, and it was slow 🐌 I tweeted about the problem and the Postgres Twitter account replied and told me about LATERAL
joins ✨
A few years after that, Benito Serna shared on Reddit an excellent series of blog posts about fetching latest-N-of-each records. The first post didn’t mention LATERAL
joins, so I commented on that and he updated the posts to include it 🙌 Since then it’s been a go-to reference for these types of queries.
bundle exec rake