Shay Banon: ElasticSearch for Big Data and Analytics

August 18, 2012
3 min

Shay Banon is the author of ElasticSearch, an open-source, distributed search server, based on Lucene. He gave the following talk at Berlin Buzzwords 2012 (The conference of High Scalability), on June 5, 2012.

You can read the outline of the talk here.

ElasticSearch basic concepts

People use ElasticSearch (ES) mostly for full-text search, but it can be used to store large amount of data and use it for analytics. The question is always the following:

How does data flow?

Shay outlines the basic ES concepts we need to understand:

  • index: is a logical namespace which maps one ore more shards and can have zero or more replicas. It is like a database in RDBMS world, but much more.

  • shard: is a Lucene instance, a low-level worker unit, managed by ES.

  • replica: is the exact copy of a primary shard, and may be used to load-balance queries or make the system or to increase the failover capacity, when a node fails.

  • node: is a running instance of ES, which belongs to a cluster. It may host multiple shards and/or replicas for multiple indices.

As each shard has its cost, one need to plan ahead to design the types and numbers of indices, shards and replicas he is going to use. Fortunately, it is very easy to run capacity tests, measure the load and decide on the certain conditions.

Data flow examples

There are different design patterns for different use cases, but each of them is focusing on how we would like to move the data around.

  • One index - sensible default if we are starting small.
  • One index / user - if searches are user-centric, and it might be extended with routing and aliasing.
  • Time-based index - e.g. one index for each day, week or month. It is easy to make last 3 months alias, old indices can be optimized, moved or deleted easily.

Data analytics

Shay outlines an example of a time-based event log with a few components and categories. He uses it to demonstrate the effortless queries to slice and dice the data. One can use these aggregations to create tables, graphs, or histograms like counts/day or counts/country.

Last updated: August 29, 2014
Question? Comment?
Contact us!