Need help with batch-loader?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

exAspArk
816 Stars 40 Forks MIT License 115 Commits 2 Opened issues

Description

:zap: Powerful tool for avoiding N+1 DB or HTTP queries

Services available

!
?

Need anything else?

Contributors list

# 62,021
Ruby
GraphQL
guard
Node.js
68 commits
# 99,706
Erlang
sequenc...
batchin...
GraphQL
7 commits
# 265,443
Rails
HTML
Shell
batchin...
5 commits
# 21,256
Ruby
Elixir
git-ser...
gitlab
2 commits
# 9,536
Ruby
Atom
opal
gitlab
2 commits
# 10,360
Ruby
sidekiq
Rails
gitlab
1 commit
# 21,986
Ruby
highlig...
gitlab
Rails
1 commit
# 471,805
Ruby
Shell
batchin...
GraphQL
1 commit
# 10,499
Koa
opam
twilio
Markdow...
1 commit
# 121,098
Markdow...
github-...
Dart
Flutter
1 commit
# 469,094
Ruby
Shell
batchin...
GraphQL
1 commit
# 396,092
opensca...
C
Shell
data-st...
1 commit

BatchLoader

Build Status Coverage Status Code Climate Downloads Latest Version

This gem provides a generic lazy batching mechanism to avoid N+1 DB queries, HTTP queries, etc.

Developers from these companies use

BatchLoader
:

GitLab Netflix Alibaba Universe Wealthsimple Decidim

Contents

Highlights

  • Generic utility to avoid N+1 DB queries, HTTP requests, etc.
  • Adapted Ruby implementation of battle-tested tools like Haskell Haxl, JS DataLoader, etc.
  • Batching is isolated and lazy, load data in batch where and when it's needed.
  • Automatically caches previous queries (identity map).
  • Thread-safe (
    loader
    ).
  • No need to share batching through variables or custom defined classes.
  • No dependencies, no monkey-patches, no extra primitives such as Promises.

Usage

Why?

Let's have a look at the code with N+1 queries:

def load_posts(ids)
  Post.where(id: ids)
end

posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3) # _ ↓ _ # ↙ ↓ ↘ users = posts.map do |post| # U ↓ ↓ SELECT * FROM users WHERE id = 1 post.user # ↓ U ↓ SELECT * FROM users WHERE id = 2 end # ↓ ↓ U SELECT * FROM users WHERE id = 3 # ↘ ↓ ↙ # ¯ ↓ ¯ puts users # Users

The naive approach would be to preload dependent objects on the top level:

# With ORM in basic cases
def load_posts(ids)
  Post.where(id: ids).includes(:user)
end

But without ORM or in more complicated cases you will have to do something like:

def load_posts(ids)

load posts

posts = Post.where(id: ids) user_ids = posts.map(&:user_id)

load users

users = User.where(id: user_ids) user_by_id = users.each_with_object({}) { |user, memo| memo[user.id] = user }

map user to post

posts.each { |post| post.user = user_by_id[post.user_id] } end

posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3) # _ ↓ _ SELECT * FROM users WHERE id IN (1, 2, 3) # ↙ ↓ ↘ users = posts.map do |post| # U ↓ ↓ post.user # ↓ U ↓ end # ↓ ↓ U # ↘ ↓ ↙ # ¯ ↓ ¯ puts users # Users

But the problem here is that

load_posts
now depends on the child association and knows that it has to preload data for future use. And it'll do it every time, even if it's not necessary. Can we do better? Sure!

Basic example

With

BatchLoader
we can rewrite the code above:
def load_posts(ids)
  Post.where(id: ids)
end

def load_user(post) BatchLoader.for(post.user_id).batch do |user_ids, loader| User.where(id: user_ids).each { |user| loader.call(user.id, user) } end end

posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3) # _ ↓ _ # ↙ ↓ ↘ users = posts.map do |post| # BL ↓ ↓ load_user(post) # ↓ BL ↓ end # ↓ ↓ BL # ↘ ↓ ↙ # ¯ ↓ ¯ puts users # Users SELECT * FROM users WHERE id IN (1, 2, 3)

As we can see, batching is isolated and described right in a place where it's needed.

How it works

In general,

BatchLoader
returns a lazy object. Each lazy object knows which data it needs to load and how to batch the query. As soon as you need to use the lazy objects, they will be automatically loaded once without N+1 queries.

So, when we call

BatchLoader.for
we pass an item (
user_id
) which should be collected and used for batching later. For the
batch
method, we pass a block which will use all the collected items (
user_ids
):
BatchLoader.for(post.user_id).batch do |user_ids, loader|
  ...
end

Inside the block we execute a batch query for our items (

User.where
). After that, all we have to do is to call
loader
by passing an item which was used in
BatchLoader.for
method (
user_id
) and the loaded object itself (
user
):
BatchLoader.for(post.user_id).batch do |user_ids, loader|
  User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end

When we call any method on the lazy object, it'll be automatically loaded through batching for all instantiated

BatchLoader
s:
puts users # => SELECT * FROM users WHERE id IN (1, 2, 3)

For more information, see the Implementation details section.

RESTful API example

Now imagine we have a regular Rails app with N+1 HTTP requests:

# app/models/post.rb
class Post < ApplicationRecord
  def rating
    HttpClient.request(:get, "https://example.com/ratings/#{id}")
  end
end

app/controllers/posts_controller.rb

class PostsController < ApplicationController def index posts = Post.limit(10) serialized_posts = posts.map { |post| {id: post.id, rating: post.rating} } # N+1 HTTP requests for each post.rating

render json: serialized_posts

end end

As we can see, the code above will make N+1 HTTP requests, one for each post. Let's batch the requests with a gem called parallel:

class Post < ApplicationRecord
  def rating_lazy
    BatchLoader.for(post).batch do |posts, loader|
      Parallel.each(posts, in_threads: 10) { |post| loader.call(post, post.rating) }
    end
  end

...

end

loader
is thread-safe. So, if
HttpClient
is also thread-safe, then with
parallel
gem we can execute all HTTP requests concurrently in threads (there are some benchmarks for concurrent HTTP requests in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.

In the controller, all we have to do is to replace

post.rating
with the lazy
post.rating_lazy
:
class PostsController < ApplicationController
  def index
    posts = Post.limit(10)
    serialized_posts = posts.map { |post| {id: post.id, rating: post.rating_lazy} }

render json: serialized_posts

end end

BatchLoader
caches the loaded values. To ensure that the cache is purged between requests in the app add the following middleware to your
config/application.rb
:
config.middleware.use BatchLoader::Middleware

See the Caching section for more information.

GraphQL example

Batching is particularly useful with GraphQL. Using such techniques as preloading data in advance to avoid N+1 queries can be very complicated, since a user can ask for any available fields in a query.

Let's take a look at the simple graphql-ruby schema example:

class MyProjectSchema < GraphQL::Schema
  query Types::QueryType
end

module Types class QueryType < Types::BaseObject field :posts, [PostType], null: false

def posts
  Post.all
end

end end

module Types class PostType < Types::BaseObject name "Post"

field :user, UserType, null: false

def user
  object.user # N+1 queries
end

end end

module Types class UserType < Types::BaseObject name "User"

field :name, String, null: false

end end

If we want to execute a simple query like the following, we will get N+1 queries for each

post.user
:
query = "
{
  posts {
    user {
      name
    }
  }
}
"
MyProjectSchema.execute(query)

To avoid this problem, all we have to do is to change the resolver to return

BatchLoader::GraphQL
(#32 explains why not just
BatchLoader
):
module Types
  class PostType < Types::BaseObject
    name "Post"

field :user, UserType, null: false

def user
  BatchLoader::GraphQL.for(object.user_id).batch do |user_ids, loader|
    User.where(id: user_ids).each { |user| loader.call(user.id, user) }
  end
end

end end

And setup GraphQL to use the built-in

lazy_resolve
method:
class MyProjectSchema < GraphQL::Schema
  query Types::QueryType
  use BatchLoader::GraphQL
end

That's it.

Loading multiple items

For batches where there is no item in response to a call, we normally return

nil
. However, you can use
:default_value
to return something else instead:
BatchLoader.for(post.user_id).batch(default_value: NullUser.new) do |user_ids, loader|
  User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end

For batches where the value is some kind of collection, such as an Array or Hash,

loader
also supports being called with a block, which yields the current value, and returns the next value. This is extremely useful for 1:Many (
has_many
) relationships:
BatchLoader.for(user.id).batch(default_value: []) do |user_ids, loader|
  Comment.where(user_id: user_ids).each do |comment|
    loader.call(comment.user_id) { |memo| memo << comment }
  end
end

Batch key

It's possible to reuse the same

BatchLoader#batch
block for loading different types of data by specifying a unique
key
. For example, with polymorphic associations:
def lazy_association(post)
  id = post.association_id
  key = post.association_type

BatchLoader.for(id).batch(key: key) do |ids, loader, args| model = Object.const_get(args[:key]) model.where(id: ids).each { |record| loader.call(record.id, record) } end end post1 = Post.save(association_id: 1, association_type: 'Tag') post2 = Post.save(association_id: 1, association_type: 'Category')

lazy_association(post1) # SELECT * FROM tags WHERE id IN (1) lazy_association(post2) # SELECT * FROM categories WHERE id IN (1)

It's also required to pass custom

key
when using
BatchLoader
with metaprogramming (e.g.
eval
).

Caching

By default

BatchLoader
caches the loaded values. You can test it by running something like:
def user_lazy(id)
  BatchLoader.for(id).batch do |ids, loader|
    User.where(id: ids).each { |user| loader.call(user.id, user) }
  end
end

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)

=>

puts user_lazy(1) # no request

=>

Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:

use BatchLoader::Middleware

To drop the cache manually you can run:

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
puts user_lazy(1) # no request

BatchLoader::Executor.clear_current

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)

In some rare cases it's useful to disable caching for

BatchLoader
. For example, in tests or after data mutations:
def user_lazy(id)
  BatchLoader.for(id).batch(cache: false) do |ids, loader|
    # ...
  end
end

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1) puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)

If you set

cache: false
, it's likely you also want
replace_methods: false
(see below section).

Replacing methods

By default,

BatchLoader
replaces methods on its instance by calling
#define_method
after batching to copy methods from the loaded value. This consumes some time but allows to speed up any future method calls on the instance. In some cases, when there are a lot of instances with a huge number of defined methods, this initial process of replacing the methods can be slow. You may consider avoiding the "up front payment" and "pay as you go" with
#method_missing
by disabling the method replacement:
BatchLoader.for(id).batch(replace_methods: false) do |ids, loader|
  # ...
end

Installation

Add this line to your application's Gemfile:

gem 'batch-loader'

And then execute:

$ bundle

Or install it yourself as:

$ gem install batch-loader

API

BatchLoader.for(item).batch(
  default_value: default_value,
  cache: cache,
  replace_methods: replace_methods,
  key: key
) do |items, loader, args|
  # ...
end

| Argument Key | Default | Description | | --------------- | --------------------------------------------- | ------------------------------------------------------------- | |

item
| - | Item which will be collected and used for batching. | |
default_value
|
nil
| Value returned by default after batching. | |
cache
|
true
| Set
false
to disable caching between the same executions. | |
replace_methods
|
true
| Set
false
to use
#method_missing
instead of replacing the methods after batching. | |
key
|
nil
| Pass custom key to uniquely identify the batch block. | |
items
| - | List of collected items for batching. | |
loader
| - | Lambda which should be called to load values loaded in batch. | |
args
|
{default_value: nil, cache: true, replace_methods: true, key: nil}
| Arguments passed to the
batch
method. |

Related tools

These gems are built by using

BatchLoader
:

BatchLoader
in other programming languages:

Implementation details

See the slides [37-42].

Development

After checking out the repo, run

bin/setup
to install dependencies. Then, run
rake spec
to run the tests. You can also run
bin/console
for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run

bundle exec rake install
. To release a new version, update the version number in
version.rb
, and then run
bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the
.gem
file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/exAspArk/batch-loader. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Alternatives

There are some other Ruby implementations for batching such as:

However,

batch-loader
has some differences:
  • It is implemented for general usage and can be used not only with GraphQL. In fact, we use it for RESTful APIs and GraphQL on production at the same time.
  • It doesn't try to mimic implementations in other programming languages which have an asynchronous nature. So, it doesn't load extra dependencies to bring such primitives as Promises, which are not very popular in Ruby community. Instead, it uses the idea of lazy objects, which are included in the Ruby standard library. These lazy objects allow one to return the necessary data at the end when it's necessary.
  • It doesn't force you to share batching through variables or custom defined classes, just pass a block to the
    batch
    method.
  • It doesn't require to return an array of the loaded objects in the same order as the passed items. I find it difficult to satisfy these constraints: to sort the loaded objects and add
    nil
    values for the missing ones. Instead, it provides the
    loader
    lambda which simply maps an item to the loaded object.
  • It doesn't depend on any other external dependencies. For example, no need to load huge external libraries for thread-safety, the gem is thread-safe out of the box.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Batch::Loader project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.