Fast memory statistics and better out-of-band GC
memstat offers a fast way to retrieve the memory usage of the current process, by providing object mapping to
/proc/[pid]/statusand
/proc/[pid]/smapson Linux.
If you've ever called the
ps -o rsscommand from inside a Ruby process to capture real memory usage, chances are, you've already learned that it is very slow.
That's because shelling out
pscreates an entire copy of the ruby process - typically 70-150MB for a Rails app - then wipe out those memory with the executable of
ps. Even with copy-on-write and POSIX-spawn optimization, you can't beat the speed of directly reading statistics from memory that is maintained by the kernel.
For a typical Rails app, memstat is 130 times faster than
ps -o rss:
Benchmark.bm(10) do |x| x.report("ps:") { 100.times.each { `ps -o rss -p #{Process.pid}`.strip.to_i } } x.report("memstat:") { 100.times.each { Memstat::Proc::Status.new(:pid => Process.pid).rss } } enduser system total real
ps: 0.110000 4.280000 6.260000 ( 6.302661) memstat: 0.040000 0.000000 0.040000 ( 0.048166)
Tested on Linode with a Rails app of 140MB memory usage.
Add this line to your application's Gemfile:
gem 'memstat'
Or install it yourself as:
$ gem install memstat
status = Memstat::Proc::Status.new(pid: Process.pid)status.peak # Peak VM size status.size # Current VM size status.lck # mlock-ed memory size (unswappable) status.pin # pinned memory size (unswappable and fixed physical address) status.hwm # Peak physical memory size status.rss # Current physical memory size status.data # Data area size status.stk # Stack size status.exe # Text (executable) size status.lib # Loaded library size status.pte # Page table size status.swap # Swap size
See details for each item.
For shared memory status between forked processes:
smaps = Memstat::Proc::Smaps.new(pid: Process.pid)smaps.size smaps.rss smaps.pss smaps.shared_clean smaps.shared_dirty smaps.private_clean smaps.private_dirty smaps.swap
See this question.
memstat also comes with a command line utility to report detailed memory statistics by aggregating
/proc/[pid]/smaps.
This is useful to examine the effectiveness of copy-on-write for forking clusters like Unicorn, Passenger, Puma and Resque.
Usage:
$ memstat smaps [PID]
will give you the following result:
Process: 13405 Command Line: unicorn master -D -E staging -c /path/to/current/config/unicorn.rb Memory Summary: size 274,852 kB rss 131,020 kB pss 66,519 kB shared_clean 8,408 kB shared_dirty 95,128 kB private_clean 8 kB private_dirty 27,476 kB swap 0 kB
In this case, 103,536 kB out of 131,020 kB is shared, which means 79% of its memory is shared with worker processes.
For more details, read this gist.
Ruby 2.1 introduced generational garbage collection. It is a major improvement in terms of shorter GC pauses and overall higher throughput, but that comes with a drawback of potential memory bloat.
You can mitigate the bloat by manually running
GC.start, but like Unicorn's out-of-band GC, doing it after every request can seriously hurt the performance. You want to run
GC.startonly when the process gets larger than X MB.
Check the memory usage, and run GC if it's too big.
if Memstat.linux? status = Memstat::Proc::Status.new(pid: Process.pid) if status.rss > 150.megabytes GC.start end end
For Unicorn, add these lines to your
config.ru(should be added above loading environment) to check memory size on every request and run GC out-of-band:
require 'memstat'use Memstat::OobGC::Unicorn, 150*(1024**2) # Invoke GC if the process is bigger than 150MB