Welcome to my personal website.
Salut! I'm a Ruby/Rust/Crystal developer with a devops background and leadership experience. You can also find me here: GitHub / Twitter / LinkedIn / Email.
Tracking down all the associations that need to be eager loaded in order to prevent N+1 queries can be tedious. Your code has to be instrumented properly and most of the times you need to reason about every single query, one by one. On top of that, eager loading can be fussy: calling where
, order
or limit
on your associations might invalidate your eager loading efforts in some unexpected ways.
This article will present an automated way of dealing with N+1 queries and it will explain how to go around some of the limitations of eager loading in ActiveRecord. Furthermore, it will show you how to use the query cache to your benefit and how to write tests to prevent those sneaky N+1 queries from coming back.
Goldiloader is a gem that eager loads your associations automatically and only when needed.
Just add it in your Gemfile
:
gem "goldiloader"
…and watch as your associations are magically eager loaded:
> users = User.limit(5).to_a
# SELECT "users".* FROM "users" LIMIT 5
> users.each { |user| user.posts.to_a }
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5)
Notice how there was no need to explicitly call includes(:posts)
when querying users. Without Goldiloader, that second call would have triggered five queries instead of one. With Goldiloader, you basically don’t have to think about calling includes
any more.
Goldiloader pairs very nicely with GraphQL APIs. The moment your API allows for querying associations or associations of associations, you have a little N+1 nightmare on your hands. GraphQL APIs with various different clients are hard to optimize because they might be used in so many different ways. Integrating something like graphql-batch could address the problem, but you have to apply it to every individual case and it’s a more intrusive solution.
On top of that, if you’re working on a large code base or if you’re inexperienced about dealing with N+1 queries, Goldiloader might give you a nice performance boost at a low cost, albeit with some limitations which we will discuss in the next part.
where
, order
, limit
Applying where
, order
or limit
clauses on your ActiveRecord associations will break eager loading, whether you’re using Goldiloader or not.
In order to make the following examples more generic, I’ll be using
includes
calls instead of Goldiloader. If you’re using Goldiloader, simply remove them or consider them redundant.
Let’s see what happens when we try to order
our posts:
> users = User.includes(:posts).limit(5).to_a
# SELECT "users".* FROM "users" LIMIT 5
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5)
> users.each { |user| user.posts.order(:created_at).to_a }
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1 ORDER BY "posts"."created_at" ASC
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 2 ORDER BY "posts"."created_at" ASC
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 3 ORDER BY "posts"."created_at" ASC
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 4 ORDER BY "posts"."created_at" ASC
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 5 ORDER BY "posts"."created_at" ASC
Even though calling includes(:posts)
produced an IN
query which seems to cover all our posts, applying the order
clause on our association ignored this and triggered a bunch of N+1 queries. In order for eager loading to work, the eager loaded query should match the query required to fetch your association.
One way to avoid this is by moving the order
inside a default scope:
class Post
default_scope { order(:created_at) }
end
> users = User.includes(:posts).limit(5).to_a
# SELECT "users".* FROM "users" LIMIT 5
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5) ORDER BY "posts"."created_at" ASC
> users.each { |user| user.posts.to_a }
# No queries here, they've been eager loaded already
…another way is by moving the order
inside the parent association:
class User
has_many :posts, -> { order(:created_at) }
end
> users = User.includes(:posts).limit(5).to_a
# SELECT "users".* FROM "users" LIMIT 5
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5) ORDER BY "posts"."created_at" ASC
> users.each { |user| user.posts.to_a }
# No queries here, they've been eager loaded already
…and yet another way is by moving the order
inside a scoped parent association:
class User
has_many :ordered_posts, -> { order(:created_at) }, class_name: "Post"
end
> users = User.includes(:ordered_posts).limit(5).to_a
# SELECT "users".* FROM "users" LIMIT 5
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5) ORDER BY "posts"."created_at" ASC
> users.each { |user| user.ordered_posts.to_a }
# No queries here, they've been eager loaded already
Let’s see what happens when we apply a where
condition to our posts:
> users = User.includes(:posts).limit(5).to_a
# SELECT "users".* FROM "users" LIMIT 5
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5)
> users.each { |user| user.posts.where("created_at < ?", 1.week.ago).to_a }
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1 AND (created_at < '2020-09-25 08:55:05.919824')
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 2 AND (created_at < '2020-09-25 08:55:05.919824')
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 3 AND (created_at < '2020-09-25 08:55:05.919824')
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 4 AND (created_at < '2020-09-25 08:55:05.919824')
# SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 5 AND (created_at < '2020-09-25 08:55:05.919824')
Ok, that didn’t work. But we can fix it exactly the same way.
You can move the where
condition inside a default scope:
class Post
default_scope { where("created_at < ?", 1.week.ago) }
end
> users = User.includes(:posts).limit(5).to_a
> users.each { |user| user.posts.to_a }
…or inside the parent association:
class User
has_many :posts, -> { where("created_at < ?", 1.week.ago) }
end
> users = User.includes(:posts).limit(5).to_a
> users.each { |user| user.posts.to_a }
…or inside a scoped parent association:
class User
has_many :posts_from_one_week_ago, -> { where("created_at < ?", 1.week.ago) }
end
> users = User.includes(:posts_from_one_week_ago).limit(5).to_a
> users.each { |user| user.posts_from_one_week_ago.to_a }
Mastering eager loading basically means relying more on default scopes and applying scopes inside your associations rather than using those scopes directly.
What about limit
and offset
? Depending on your use case, you could apply the same techniques. But there’s one big use case that just doesn’t fit here: pagination. How would you supply the page number? And while we’re at it, how would you deal with where
conditions or scopes that contain a variable?
Unlike scopes, default scopes and scoped associations don’t take arguments. If we’d like to provide our associations with an outside parameter while avoiding N+1 queries, we’ll have to think of something else.
Consider the following code which produces N+1 queries:
# An outside parameter
time = 1.week.ago
users = User.includes(:posts).to_a
users.each { |user| user.posts.where("created_at < ?", time).to_a } # N+1!
In such cases, you can turn your queries around and let the association become the main subject of that query:
# An outside parameter
time = 1.week.ago
posts = Post.includes(:user).where("created_at < ?", time).to_a
# SELECT "posts".* FROM "posts" WHERE (created_at < '2020-09-27 10:03:23.478773')
# SELECT "users".* FROM "users" WHERE "users"."id" IN (1,2,3,4,5)
users = posts.map(&:user)
# No queries here, they've been eager loaded already
Let’s say you’d like to query the total number of posts for every user:
users = User.all
# SELECT "users".* from "users"
users.each { |user| user.posts.count }
# SELECT COUNT(*) FROM "posts" WHERE user_id = 1
# SELECT COUNT(*) FROM "posts" WHERE user_id = 2
# SELECT COUNT(*) FROM "posts" WHERE user_id = 3
# SELECT COUNT(*) FROM "posts" WHERE user_id = 4
# SELECT COUNT(*) FROM "posts" WHERE user_id = 5
As you assumed, it triggered a bunch of N+1 queries. But this time there’s no way to eager load these aggregate values by calling includes
.
Instead you can make use of the ActiveRecord query cache. By default, ActiveRecord caches results for every individual SQL query, ensuring that subsequent calls placed within the same web request or background job will not hit the database.
Our COUNT
queries differ though – every distinct user_id
will break the caching. Then again, there’s nothing speaking against rewriting our queries to produce the same SQL every time:
users = User.all
users.each do |user|
posts_count_per_user = Post.group(:user_id).count # Returns a Hash
posts_count_per_user[user.id] || 0
end
Running this code from within a web request will produce the following log output:
# (1.4ms) SELECT "users".* from "users"
# (2.0ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
The first iteration triggered a COUNT
query, but all subsequent calls were cached, which means they didn’t hit the database and the N+1 situation was avoided.
Keep in mind that relying on the query cache too much might have a potential impact on the amount of allocations and consequently memory usage of your app, especially when triggering queries that have to initialize many ActiveRecord models. For this reason, prefer using includes
when possible or write your aggregate queries in such a way that they resolve to simple structures (hashes or arrays of primitive values).
If you’d like to enable the query cache outside of web requests or background jobs or if you’d like to try it out in your rails console
, you can call:
ActiveRecord::Base.connection.enable_query_cache!
# Any queries triggered here might be cached
ActiveRecord::Base.connection.disable_query_cache!
Given a large code base, introducing more N+1 queries is fairly easy – as easy as writing user.posts
inside a loop, somewhere deep inside a template, without remembering to also eager load the association in the controller. But you can write tests to prevent that…
First, let’s settle on the behaviour we’d like to test: any request should trigger at most one SQL query per table. In order to track and count all these queries, we could use ActiveSupport instrumentation to hook up to the sql.active_record
event, the same way ActiveRecord is tested.
I already packaged this in a tiny gem called sql_spy.
Just add it to your Gemfile
:
gem "sql_spy"
…and let’s write our first controller test:
require "sql_spy"
class PostsControllerTest < ActionDispatch::IntegrationTest
test "GET /posts should not trigger N+1 queries" do
# Add some realistic test data here
queries = SqlSpy.track do
get "/posts"
end
select_queries_by_model = queries.select(&:select?).group_by(&:model_name)
# We only want the SELECT queries and we'd like them grouped by model
assert select_queries_by_model.all? { |_, queries| queries.count <= 1 }
# Our tolerance rate is 1 query per table
# You can increase this value depending on your business logic
end
end
Some things are worth mentioning here. Your test is only as correct as the test data you provide it with. Ideally try to keep a realistic set of fixtures around (3-4 examples of each core model) or create a realistic environment before the test run.
Some requests might genuinely be triggering more than one query per table. For example, paginated requests usually produce two queries: one to fetch the records, another one for the page count. In such cases, you can increase the tested tolerance rate, while also increasing the number of records per table in your test setup, to make sure you’re still catching those N+1 queries.
You can introduce these N+1 regression tests to every critical or data-intensive controller action. Their setup is fairly cheap, while N+1 queries can come at a very high cost for your database and your response times.
The best results will come from applying a combination of the solutions above. Golidloader will get rid of some of your N+1 queries, but you’ll also need to start writing your associations with eager loading in mind. Testing for N+1 regressions and proper instrumentation will keep your hard-won performance improvements intact.
What do we say to the God of N+1 Queries? Not today.
If you enjoyed my blog post, please spread the news: