Ruby EventMachine file tailing and friends. 'gem install eventmachine-tail' to install.
This project contains two "EventMachine":http://wiki.github.com/eventmachine/eventmachine/ extensions.
First, it adds an event-driven file-following similar to the unix 'tail -f' command. For example, you could use it to follow /var/log/messages the same way tail -f would.
Second, it adds event-driven file patterns allowing you to watch a given file pattern for new or removed files. For example, you could watch /var/log/*.log for new/deleted files.
For "logstash":http://code.google.com/p/logstash/, the log agents were event-driven using EventMachine. The log agents mainly get their data from logfiles. To that end, we needed a way to treat log files as a stream.
There's a ruby gem 'file-tail' that implements tailing, but not in an event-driven way. This makes it hard to use in EventMachine programs like logstash.
Thus, eventmachine-tail was born.
Further, the usage patterns for logstash required the ability to watch a directory (or a file pattern) for new log files.
h2. Get eventmachine-tail
To install eventmachine-tail, you only need to use gem:
bc. gem install eventmachine-tail
This will also install the rtail tool described below.
h2. Implementation Thoughts
h3. "Don't block the reactor"
EventMachine::FileTail will only read one chunk (64K by default) from any file for each tick of the reactor. This helps ensure we don't spend too much time reading from the file.
A consequence of not reading the file "as fast as possible until eof" can cause longer data sets to take longer to read since there will be a small delay between each read since the reactor only runs about once every 50ms (depending on the polling features used).
h3. EOF Handling
When we are at the end of file, we rely on EventMachine::FileWatch to notify us when the file changes. On Linux this uses inotify and on OS X and FreeBSD will use kqueue to get event-driven notifications of file changes.
h3. File truncation
If the file length ever shortens, we assume this means the file was truncated. When this happens, we seek to the beginning of the file and continue reading.
h3. File rotation
If we notice that the inode (or underlying device) has changed, we will reopen the file by pathname. This change is generally an indication that our file has been rotated as part of some periodic log rotation.
You can also use file "glob patterns":http://en.wikipedia.org/wiki/Glob_(programming) to select files to tail. This pattern is checked periodically to see if new files are found.
For example, if your java app writes to logfiles with generated filenames that are hard to predict, you should use EventMachine::FileGlobWatch to match any generated files as they are generated.
Globs are implemented in a polling manner because of the "non-obviousness" of implementing a glob notifier with file watches. It is on my todo list to implement in this way, however.
h2. Example code
For a simple file tailing example, look at "tail.rb":http://github.com/jordansissel/eventmachine-tail/blob/master/samples/tail.rb
For an example of using glob watching file patterns, look at "globwatch.rb":http://github.com/jordansissel/eventmachine-tail/blob/master/samples/globwatch.rb
For an example of using both glob watching and file tailing, check out "glob-tail.rb":http://github.com/jordansissel/eventmachine-tail/blob/master/samples/glob-tail.rb - this project provides a glue class that helps you easily tail globs.
h2. rtail tool
A script comes with eventmachine-tail called 'rtail'
This script allows you to tail files similar to tail(1) but allows you to use the glob watching feature of eventmachine-tail to watch file path patterns (globs) in a tail-ey way.
For example, if you want to 'tail -f' all files, recursively, in /var/log except ones matching '*.gz', you would use:
bc. rtail -x ".gz" "/var/log//"
This will follow any existing files and follow newly-created ones that match the glob given as they are created.
By default, rtail checks the glob pattern every 5 seconds. You can change this value with the '-i' flag.
You can also tell rtail to not prefix each line with the filename by giving the '-n' flag.