:zap: Powerful tool for avoiding N+1 DB or HTTP queries
This gem provides a generic lazy batching mechanism to avoid N+1 DB queries, HTTP queries, etc.
Developers from these companies use
BatchLoader:
loader).
Let's have a look at the code with N+1 queries:
def load_posts(ids) Post.where(id: ids) endposts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3) # _ ↓ _ # ↙ ↓ ↘ users = posts.map do |post| # U ↓ ↓ SELECT * FROM users WHERE id = 1 post.user # ↓ U ↓ SELECT * FROM users WHERE id = 2 end # ↓ ↓ U SELECT * FROM users WHERE id = 3 # ↘ ↓ ↙ # ¯ ↓ ¯ puts users # Users
The naive approach would be to preload dependent objects on the top level:
# With ORM in basic cases def load_posts(ids) Post.where(id: ids).includes(:user) endBut without ORM or in more complicated cases you will have to do something like:
def load_posts(ids)
load posts
posts = Post.where(id: ids) user_ids = posts.map(&:user_id)
load users
users = User.where(id: user_ids) user_by_id = users.each_with_object({}) { |user, memo| memo[user.id] = user }
map user to post
posts.each { |post| post.user = user_by_id[post.user_id] } end
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3) # _ ↓ _ SELECT * FROM users WHERE id IN (1, 2, 3) # ↙ ↓ ↘ users = posts.map do |post| # U ↓ ↓ post.user # ↓ U ↓ end # ↓ ↓ U # ↘ ↓ ↙ # ¯ ↓ ¯ puts users # Users
But the problem here is that
load_postsnow depends on the child association and knows that it has to preload data for future use. And it'll do it every time, even if it's not necessary. Can we do better? Sure!
With
BatchLoaderwe can rewrite the code above:
def load_posts(ids) Post.where(id: ids) enddef load_user(post) BatchLoader.for(post.user_id).batch do |user_ids, loader| User.where(id: user_ids).each { |user| loader.call(user.id, user) } end end
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3) # _ ↓ _ # ↙ ↓ ↘ users = posts.map do |post| # BL ↓ ↓ load_user(post) # ↓ BL ↓ end # ↓ ↓ BL # ↘ ↓ ↙ # ¯ ↓ ¯ puts users # Users SELECT * FROM users WHERE id IN (1, 2, 3)
As we can see, batching is isolated and described right in a place where it's needed.
In general,
BatchLoaderreturns a lazy object. Each lazy object knows which data it needs to load and how to batch the query. As soon as you need to use the lazy objects, they will be automatically loaded once without N+1 queries.
So, when we call
BatchLoader.forwe pass an item (
user_id) which should be collected and used for batching later. For the
batchmethod, we pass a block which will use all the collected items (
user_ids):
BatchLoader.for(post.user_id).batch do |user_ids, loader| ... end
Inside the block we execute a batch query for our items (
User.where). After that, all we have to do is to call
loaderby passing an item which was used in
BatchLoader.formethod (
user_id) and the loaded object itself (
user):
BatchLoader.for(post.user_id).batch do |user_ids, loader| User.where(id: user_ids).each { |user| loader.call(user.id, user) } end
When we call any method on the lazy object, it'll be automatically loaded through batching for all instantiated
BatchLoaders:
puts users # => SELECT * FROM users WHERE id IN (1, 2, 3)
For more information, see the Implementation details section.
Now imagine we have a regular Rails app with N+1 HTTP requests:
# app/models/post.rb class Post < ApplicationRecord def rating HttpClient.request(:get, "https://example.com/ratings/#{id}") end endapp/controllers/posts_controller.rb
class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map { |post| {id: post.id, rating: post.rating} } # N+1 HTTP requests for each post.rating
render json: serialized_posts
end end
As we can see, the code above will make N+1 HTTP requests, one for each post. Let's batch the requests with a gem called parallel:
class Post < ApplicationRecord def rating_lazy BatchLoader.for(post).batch do |posts, loader| Parallel.each(posts, in_threads: 10) { |post| loader.call(post, post.rating) } end end...
end
loaderis thread-safe. So, if
HttpClientis also thread-safe, then with
parallelgem we can execute all HTTP requests concurrently in threads (there are some benchmarks for concurrent HTTP requests in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.
In the controller, all we have to do is to replace
post.ratingwith the lazy
post.rating_lazy:
class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map { |post| {id: post.id, rating: post.rating_lazy} }render json: serialized_posts
end end
BatchLoadercaches the loaded values. To ensure that the cache is purged between requests in the app add the following middleware to your
config/application.rb:
config.middleware.use BatchLoader::Middleware
See the Caching section for more information.
Batching is particularly useful with GraphQL. Using such techniques as preloading data in advance to avoid N+1 queries can be very complicated, since a user can ask for any available fields in a query.
Let's take a look at the simple graphql-ruby schema example:
class MyProjectSchema < GraphQL::Schema query Types::QueryType endmodule Types class QueryType < Types::BaseObject field :posts, [PostType], null: false
def posts Post.all end
end end
module Types class PostType < Types::BaseObject name "Post"
field :user, UserType, null: false def user object.user # N+1 queries end
end end
module Types class UserType < Types::BaseObject name "User"
field :name, String, null: false
end end
If we want to execute a simple query like the following, we will get N+1 queries for each
post.user:
query = " { posts { user { name } } } " MyProjectSchema.execute(query)
To avoid this problem, all we have to do is to change the resolver to return
BatchLoader::GraphQL(#32 explains why not just
BatchLoader):
module Types class PostType < Types::BaseObject name "Post"field :user, UserType, null: false def user BatchLoader::GraphQL.for(object.user_id).batch do |user_ids, loader| User.where(id: user_ids).each { |user| loader.call(user.id, user) } end end
end end
And setup GraphQL to use the built-in
lazy_resolvemethod:
class MyProjectSchema < GraphQL::Schema query Types::QueryType use BatchLoader::GraphQL end
That's it.
For batches where there is no item in response to a call, we normally return
nil. However, you can use
:default_valueto return something else instead:
BatchLoader.for(post.user_id).batch(default_value: NullUser.new) do |user_ids, loader| User.where(id: user_ids).each { |user| loader.call(user.id, user) } end
For batches where the value is some kind of collection, such as an Array or Hash,
loaderalso supports being called with a block, which yields the current value, and returns the next value. This is extremely useful for 1:Many (
has_many) relationships:
BatchLoader.for(user.id).batch(default_value: []) do |user_ids, loader| Comment.where(user_id: user_ids).each do |comment| loader.call(comment.user_id) { |memo| memo << comment } end end
It's possible to reuse the same
BatchLoader#batchblock for loading different types of data by specifying a unique
key. For example, with polymorphic associations:
def lazy_association(post) id = post.association_id key = post.association_typeBatchLoader.for(id).batch(key: key) do |ids, loader, args| model = Object.const_get(args[:key]) model.where(id: ids).each { |record| loader.call(record.id, record) } end end post1 = Post.save(association_id: 1, association_type: 'Tag') post2 = Post.save(association_id: 1, association_type: 'Category')
lazy_association(post1) # SELECT * FROM tags WHERE id IN (1) lazy_association(post2) # SELECT * FROM categories WHERE id IN (1)
It's also required to pass custom
keywhen using
BatchLoaderwith metaprogramming (e.g.
eval).
By default
BatchLoadercaches the loaded values. You can test it by running something like:
def user_lazy(id) BatchLoader.for(id).batch do |ids, loader| User.where(id: ids).each { |user| loader.call(user.id, user) } end endputs user_lazy(1) # SELECT * FROM users WHERE id IN (1)
=>
puts user_lazy(1) # no request
=>
Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:
use BatchLoader::Middleware
To drop the cache manually you can run:
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1) puts user_lazy(1) # no requestBatchLoader::Executor.clear_current
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
In some rare cases it's useful to disable caching for
BatchLoader. For example, in tests or after data mutations:
def user_lazy(id) BatchLoader.for(id).batch(cache: false) do |ids, loader| # ... end endputs user_lazy(1) # SELECT * FROM users WHERE id IN (1) puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
If you set
cache: false, it's likely you also want
replace_methods: false(see below section).
By default,
BatchLoaderreplaces methods on its instance by calling
#define_methodafter batching to copy methods from the loaded value. This consumes some time but allows to speed up any future method calls on the instance. In some cases, when there are a lot of instances with a huge number of defined methods, this initial process of replacing the methods can be slow. You may consider avoiding the "up front payment" and "pay as you go" with
#method_missingby disabling the method replacement:
BatchLoader.for(id).batch(replace_methods: false) do |ids, loader| # ... end
Add this line to your application's Gemfile:
gem 'batch-loader'
And then execute:
$ bundle
Or install it yourself as:
$ gem install batch-loader
BatchLoader.for(item).batch( default_value: default_value, cache: cache, replace_methods: replace_methods, key: key ) do |items, loader, args| # ... end
| Argument Key | Default | Description | | --------------- | --------------------------------------------- | ------------------------------------------------------------- | |
item| - | Item which will be collected and used for batching. | |
default_value|
nil| Value returned by default after batching. | |
cache|
true| Set
falseto disable caching between the same executions. | |
replace_methods|
true| Set
falseto use
#method_missinginstead of replacing the methods after batching. | |
key|
nil| Pass custom key to uniquely identify the batch block. | |
items| - | List of collected items for batching. | |
loader| - | Lambda which should be called to load values loaded in batch. | |
args|
{default_value: nil, cache: true, replace_methods: true, key: nil}| Arguments passed to the
batchmethod. |
These gems are built by using
BatchLoader:
BatchLoaderin other programming languages:
See the slides [37-42].
After checking out the repo, run
bin/setupto install dependencies. Then, run
rake specto run the tests. You can also run
bin/consolefor an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run
bundle exec rake install. To release a new version, update the version number in
version.rb, and then run
bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the
.gemfile to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/exAspArk/batch-loader. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
There are some other Ruby implementations for batching such as:
However,
batch-loaderhas some differences:
batchmethod.
nilvalues for the missing ones. Instead, it provides the
loaderlambda which simply maps an item to the loaded object.
The gem is available as open source under the terms of the MIT License.
Everyone interacting in the Batch::Loader project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.