I’ve been a heavy user of ElasticSearch for coming up 7 years now. During that time I’ve used it for a few main usecases: A Search Engine, An APM Solution (after NewRelic started being stupidly expensive), a backend for Jaeger, and as a log storage system. In all of those usecases I’ve really pushed ElasticSearch to its limits, with hundreds of terrabytes of data across dozens of machines and tens of thousands of shards and in all that time I’ve found that it really only works well for one of those situations. Particularly with Elastic’s push towards being anti-user, I wanted to question whether storing log data is a good usecase for ElasticSearch and suggest some better options.
ElasticSearch (actually Lucene but for our purposes here they are interchangable) was designed with the search engine index usecase in mind (and I’ve found it works exceedingly well for this). To that end, it expects documents to follow a relatively static (at least within a given index) structure, and optimises for full text search (notably through the use of a Reverse Index). But are these what we want for log storage? I’d argue no.
In a large company, you generally find many different services logging many different types of things. If you’re using structured logging then this means many different teams logging many different field names. In order to not place too much of a burden on your users, you’re probably using something like dynamic mappings to dynamically update your index mapping whenever a new field is seen. This works fine, until one day your index just suddenly stops accepting new documents. You’ve hit your maximum number of fields per index. You bump that and continue on, but it’ll keep happening. And what’s worse, your search performance will degrade more and more, with your index size bloating as more fields are in the index.