Determining buffer size for RedisHandler in Python logbook

A few weeks ago here in Stockholm we had a meetup about Elasticsearch. In this meetup they were talking about projects that use Elasticsearch, and one of these projects was Logstash. Logstash is a tool for managing events and logs. Basically, it collects your log events and store them into Elasticsearch. Later on, with the provided web interface called Kibana, you can search within all your logs in a really cool way.

I thought that this would be very useful for us and therefore I started to do some testing. The tests worked so smoothly that we have decided to put Logstash into production, and here comes the story. The figure below represents the recommended Lostash stack:

Logstash architecture

The message broker they recommend is Redis, and this is the one I tested. The majority of our scripts and tools are written in Python, and for logging, we’re using a logging package called logbook. If order to start sending messages to Logstash in an easy way and less intrusively as possible, I decided to implement a new log Handler for Redis, in such a way that to log into Logstash, we will just have to add the new Handler to our Logger. If you don’t know that much about python logging, take a quick look at this page.

After implementing the first version of the handler, and pull-requested to the upstream project, Gustavo J.A.M Carnerio, who has previously contributed to logbook, suggested me to use buffering. What I was doing in my first implementation of the handler was to push to Redis every single message that the handler emitted, directly. I therefore implemented a buffer that is emptied when it reaches certain size, or every second if never reaches that size, pushing the whole set of messages at once. But… what size is optimal?

Actually, as Gustavo suggested the only way to determine the size for the buffer is to run some tests and that’s what I did. To test the performance of the handler, I prepared a script that for a set of buffer sizes, takes the time required to insert a certain amount of messages. I tested different buffer sizes and different amount of messages sent in a row. These are the results:

Performance 1000000

If we zoom in, we can see that, in all cases, with a buffer of 128 messages we’ve been able to insert more messages in less time. Buffering is indeed necessary, otherwise it becomes really slow for higher number of events:

Performance 100

Performance 1000

Performance 10000

Performance 100000

The speed-buffer size ratio seems stable throughout all tests but not all corner cases are covered in this benchmark. Other factors like increasing or decreasing the interval time for the automatic flushing (currently one sec., as said before), can influence in the performance.

But at least these tests justify the default parameters (totally customisable, by the way), of the RedisHandler in logbook.

Last but not least, thanks to Gustavo and Roman for the advices during the development of the handler (you can see our discussions in the pull request to the master project).

Guillermo Carrasco

In automation, we trust.