The Logstash program for collecting and processing logs from is popular and commonly used to process e.g. syslog messages and HTTP logs.
Apart from ingesting log events and sending them to one or more destinations it can transform the events in various ways, including extracting discrete fields from flat blocks of text, joining multiple physical lines into singular logical events, parsing JSON and XML, and deleting unwanted events. It uses its own domain-specific configuration language to describe both inputs, outputs, and the filters that should be applied to events.
Writing the filter configurations necessary to parse events isn't difficult for someone with basic programming skills, but verifying that the filters do what you expect can be tedious; especially when you tweak existing filters and want to make sure that all kinds of logs will continue to be processed as before. If you get something wrong you might have millions of incorrectly parsed events before you realize your mistake.
This is where Logstash Filter Verifier comes in. It lets you define test case files containing lines of input together with the expected output from Logstash. Pass one of more such test case files to Logstash Filter Verifier together with all of your Logstash filter configuration files and it'll run Logstash for you and verify that Logstash actually returns what you expect.
Before you can run Logstash Filter Verifier you need to install it. After covering that, let's start with a simple example and follow up with reference documentation.
All releases of Logstash Filter Verifier are published in binary form for the most common platforms at github.com/magnusbaeck/logstash-filter-verifier/releases.
If you need to run the program on other platforms or if you want to modify the program yourself you can build and use it on any platform for which a recent Go compiler is available. Pretty much any platform where Logstash runs should be fine, including Windows.
Many Linux distributions make some version of the Go compiler easily installable, but otherwise you can download and install the latest version. The source code is written to use Go modules for dependency management and it seems you need at least Go 1.13.
To just build an executable file you don't need anything but the Go compiler; just clone the Logstash Filter Verifier repository and run
go buildfrom the root directory of the cloned repostiory. If successful you'll find an executable in the current directory.
One drawback of this is that the program won't get stamped with the correct version number, so
logstash-filter-verifier --versionwill say "unknown"). To address this and make it easy to run tests and static checks you need GNU make and other GNU tools.
The makefile can also be used to install Logstash Filter Verifier centrally, by default in /usr/local/bin but you can change that by modifying the PREFIX variable. For example, to install it in $HOME/bin (which is probably in your shell's path) you can issue the following command:
$ make install PREFIX=$HOME
The examples that follow build upon each other and do not only show how to use Logstash Filter Verifier to test that particular kind of log. They also highlight how to deal with different features in logs.
Logstash is often used to parse syslog messages, so let's use that as a first example.
Test case files are in JSON or YAML format and contain a single object with about a handful of supported properties.
Sample with JSON format:
json { "fields": { "type": "syslog" }, "testcases": [ { "input": [ "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message" ], "expected": [ { "@timestamp": "2015-10-06T20:55:29.000Z", "host": "myhost", "message": "This is a test message", "pid": 31993, "program": "myprogram", "type": "syslog" } ] } ] }
Sample with YAML format:
yaml fields: type: "syslog" testcases: - input: - "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message" expected: - "@timestamp": "2015-10-06T20:55:29.000Z" host: "myhost" message: "This is a test message" pid: 31993 program: "myprogram" type: "syslog"
Most Logstash configurations contain filters for multiple kinds of logs and uses conditions on field values to select which filters to apply. Those field values are typically set in the input plugins. To make Logstash treat the test events correctly we can "inject" additional field values to make the test events look like the real events to Logstash. In this example,
fields.typeis set to "syslog" which means that the input events in the test cases in this file will have that in their
typefield when they're passed to Logstash.
Next, in
input, we define a single test string that we want to feed through Logstash, and the
expectedarray contains a one-element array with the event we expect Logstash to emit for the given input.
The
testcasesarray can contain multiple objects with
inputand
expectedkeys. For example, if we change the example above to
fields: type: "syslog" testcases: - input: - "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message" expected: - "@timestamp": "2015-10-06T20:55:29.000Z" host: "myhost" message: "This is a test message" pid: 31993 program: "myprogram" type: "syslog" - input: - "Oct 6 20:55:29 myhost myprogram: This is a test message" expected: - "@timestamp": "2015-10-06T20:55:29.000Z" host: "myhost" message: "This is a test message" program: "myprogram" type: "syslog"
we also test syslog messages that lack the bracketed pid after the program name.
Note that UTC is the assumed timezone for input events to avoid different behavior depending on the timezone of the machine where Logstash Filter Verifier happens to run. This won't affect time formats that include a timezone.
This command will run this test case file through Logstash Filter Verifier (replace all "path/to" with the actual paths to the files, obviously):
$ path/to/logstash-filter-verifier path/to/syslog.json path/to/filters
If the test is successful, Logstash Filter Verifier will terminate with a zero exit code and (almost) no output. If the test fails it'll run
diff -u(or some other command if you use the
--diff-commandflag) to compare the pretty-printed JSON representation of the expected and actual events.
The actual event emitted by Logstash will contain a
@versionfield, but since that field isn't interesting it's ignored by default when reading the actual event. Hence we don't need to include it in the expected event either. Additional fields can be ignored with the
ignorearray property in the test case file (see details below).
In Beats
you can also specify fields to control the behavior of the Logstash pipeline.
An example in Beats config might look like this:
- input_type: log paths: ["/var/log/work/*.log"] fields: type: openlog - input_type: log paths: ["/var/log/trace/*.trc"] fields: type: traceThe Logstash configuration would then look like this to check the given field:
if ([fields][type] == "openlog") { Do something for type openlogBut, in order to test the behavior with LFV you have to give it like so:
{ "fields": { "[fields][type]": "openlog" },The reason is, that Beats is inserting by default declared fields under a root element
fields, while the LFV is just considering it as a configuration option.
fields_under_root: true
I always prefer to configure applications to emit JSON objects whenever possible so that I don't have to write complex and/or ambiguous grok expressions. Here's an example:
{"message": "This is a test message", "client": "127.0.0.1", "host": "myhost", "time": "2015-10-06T20:55:29Z"}
When you feed events like this to Logstash it's likely that the input used will have its codec set to "json_lines". This is something we should mimic on the Logstash Filter Verifier side too. Use
codecfor that:
Sample with JSON format:
json { "fields": { "type": "app" }, "codec": "json_lines", "ignore": ["host"], "testcases": [ { "input": [ "{\"message\": \"This is a test message\", \"client\": \"127.0.0.1\", \"time\": \"2015-10-06T20:55:29Z\"}" ], "expected": [ { "@timestamp": "2015-10-06T20:55:29.000Z", "client": "localhost", "clientip": "127.0.0.1", "message": "This is a test message", "type": "app" } ] } ] }
Sample with YAML format:
yaml fields: type: "app" codec: "json_lines" ignore: - "host" testcases: - input: - > { "message": "This is a test message", "client": "127.0.0.1", "time": "2015-10-06T20:55:29Z" } expected: - "@timestamp": "2015-10-06T20:55:29.000Z" client: "localhost" clientip: "127.0.0.1" message: "This is a test message" type: "app"
There are a few points to be made here:
[or
{or if a numeric value should be forced to be parsed as a string.
>to create folded lines in the YAML representation makes the input JSON much easier to read.
clientfield into a hostname and copy the original IP address into the
clientipfield. To avoid future problems and flaky tests, pick a hostname or IP address for the test case that will always resolve to the same thing. As in this example, localhost and 127.0.0.1 should be safe picks.
hostfield, Logstash will add such a field containing the name of the current host. To avoid test cases that behave differently depending on the host where they're run, we ignore that field with the
ignoreproperty.
Test case files are JSON files containing a single object. That object may have the following properties:
codec: A string with the codec configuration of the input plugin used when executing the tests. This string will be included verbatim in the Logstash configuration so it could either be just the name of the codec plugin (normally
lineor
json_lines) or include additional codec options like e.g.
plain { charset => "ISO-8859-1" }.
fields: An object containing the fields that all input messages should have. This is vital since filters typically are configured based on the event's type and/or tags. Scalar values (strings, numbers, and booleans) are supported, as are objects (containing scalars, arrays and nested objects), arrays of scalars and nested arrays. The only combination which is not allowed are objects within arrays. A shorthand for defining nested fields is to use the Logstash's field reference syntax (
[field][subfield]), i.e.
fields: {"[log][file][path]": "/tmp/test.log"}is equivalent to
fields: {"log": {"file": {"path": "/tmp/test.log"}}}.
ignore: An array with the names of the fields that should be removed from the events that Logstash emit. This is for example useful for dynamically generated fields whose contents can't be predicted and hardwired into the test case file. If you need to exclude individual subfields you can use Logstash's field reference syntax, i.e.
[log][file][path]will exclude that field but keep other subfields of
loglike e.g.
[log][level]and
[log][file][line].
testcases: An array of test case objects, each having the following contents:
input: An array with the lines of input (each line being a string) that should be fed to the Logstash process. If you use
json_linescodec you can use Logstash's syntax reference syntax for fields in the JSON object, making
{"message": "my message", "[log][file][path]": "/tmp/test.log"}equivalent to
{"message": "my message", "log": {"file": {"path": "/tmp/test.log"}}}.
expected: An array of JSON objects with the events to be expected. They will be compared to the actual events produced by the Logstash process.
description: An optional textual description of the test case, e.g. useful as documentation. This text will be included in the program's progress messages.
Originally the
inputand
expectedconfiguration keys were at the top level of the test case file. They were later moved into the
testcaseskey but the old configuration format is still supported.
To migrate test case files from the old to the new file format the following command using jq can be used (run it in the directory containing the test case files):
for f in *.json ; do jq '{ codec, fields, ignore, testcases:[[.input[]], [.expected[]]] | transpose | map({input: [.[0]], expected: [.[1]]})} | with_entries(select(.value != null))' $f > $f.migrated && mv $f.migrated $f done
This command only works for test case files where there's a one-to-one mapping between the elements of the
inputarray and the elements of the
expectedarray. If you e.g. have drop and/or split filters in your Logstash configuration you'll have to patch the converted test case file by hand afterwards.
--socketsflag
The command line flag
--socketsallows to use unix domain sockets instead of stdin to send the input to Logstash. The advantage of this approach is, that it allows to process test case files in parallel to Logstash, instead of starting a new Logstash instance for every test case file. Because Logstash is known to start slowly, this increases the time needed significantly, especially if there are lots of different test case files.
For the test cases to work properly together with the unix domain socket input, the test case files need to include the property
codecset to the value
line(or
json_lines, if json formatted input should be processed).
--logstash-argflag
The
--logstash-argflag is used to supply additional command line arguments or flags for Logstash. Those arguments are not processed by Logstash Filter Verifier other than just forwarding them to Logstash. For flags consisting of a flag name and a value, for both a seperate
--logstash-argin the correct order has to be provided. Because values, starting with one or two dashes (
-) are treated as flag by Logstash Filter Verifier, for those flags the value must not be separated using a space but they have to be separated from the flag with the equal sign (
=).
For example to set the Logstash node name the following arguments have to be provided to Logstash Filter Verifier:
--logstash-arg=--node.name --logstash-arg MyInstanceName
Different versions of Logstash behave slightly differently and changes in Logstash may require changes in Logstash Filter Verifier. Upon startup, the program will attempt to auto-detect the version of Logstash used and will use this information to adapt its own behavior.
Starting with Logstash 5.0 finding out the Logstash version is very quick but in previous versions the version string was printed by Ruby code in the JVM so it took several seconds. To avoid this you can use the
--logstash-versionflag to tell Logstash Filter Verifier which version of Logstash it should expect. Example:
logstash-filter-verifier ... --logstash-version 2.4.0
Logstash Filter Verifier has been reported to work on Windows, but this isn't tested by the author and it's not guaranteed to work. There are a couple of known quirks that are easy to work around:
--logstash-pathflag.
--diff-commandis
diff -uwhich won't work on typical Windows machines. You'll have to explicitly select which diff tool to use.
This software is copyright 2015–2020 by Magnus Bäck <[email protected]> and licensed under the Apache 2.0 license. See the LICENSE file for the full license text.