10 Feb 2016, 09:30

Around the Data - February 2016 - ELK, Kafka, Flink, Spark





  • Spark 2015 year in review : Databricks (core developper of Spark) made a review of 2015 : the 4 release, the features, how spark is used, etc. I was surprised to see that majority of spark usage was as a standadlone cluster and not in an Hadoop context.

11 Nov 2015, 09:30

Around the Data - November 2015 - Diving in Kafka and ELK 2.0 releases


  • A 2 part blog post serie (part 1, part 2) to learn the genese of Kafka within LinkedIn before being opensourced and hosted by the Apache foundation ; always interesting to know the first use cases the software was built for and how it became the distributing messaging system we now know. It also stand current use cases for Kafka.
  • Putting Apache Kafka to use : a practical guide to building a streaming data platform (part 1 ; part 2) : the first part is about the shift to the event based approach and the definition of a streaming data platform. The second part is about about implementation best practices.
  • A kafka presentation from MixIt event (in French), which introduces Kafka and how it is used at EDF for the "Linky" device (energy monitor)

ElasticSearch / Logstash / Kibana

13 Feb 2014, 12:50

Elasticsearch 1.0 - distributed & RESTful search engine

ElasticSearch (ES) is a distributed search engine, RESTful and based on the (famous) library Apache Lucene ; if you use indexation services in your application, you may already use it or Solr which is also based on Lucene.

Aside ElasticSearch, the company behind the product also release 3 other opensource products linked with ES :

  • Kibana to produce dashboard and reports from ES and more widely to interact with data in ES.
  • Logstatsh combined with ES to analyse logs & events.
  • Marvel was also just released to monitor your ES cluster.

So with the 1.0 release, a lot of things have been included in ES, which had already a lot of interesting features (Documentation).

There are also bindings for PHP, Java, Perl, Python, Ruby, Javascript ; so you should be able to integrate it with your app easily.

Xebia (French consulting company) published some blog posts on ES in last december which shows you how to start with ES :

IIplayed with it a little bit and was quite impressed by its relevance. The only "issue" was to push content trough the REST Api as binary indexation is not native but there is file system river for that.

So if you need to index/retrieve content or manipulate data, I would recommend you having a look at ElasticSearch ecosystem.

ElasticSearch is also used in Graylog² and according to a colleague, on log analysis, it would be very relevant with the use of Gralylog Extended Log Format (GELF). If someone has an experience on Graylog² vs ES/Logstash/Kibana on this, I'm interested to have their opinions !