16 Mar 2016, 09:30

Around the Data - March 2016 - Kafka Connect and Streams, Hortonworks HDP 2.4, Elasticsearch cluster sizing, Spark Graph Frames, Flink 1.0


  • Kafka and Confluent 2.0.1 were released (bugfix release)
  • A more detailled presentation about Kafka Connect.
  • Confluent released a custom version of what will be in Kafka 0.10 and it's called Kafka Streams ; it aims to manipulate data within Kafka without requiring any external system such as Spark & co. Interesting move to make it feature richer and more autonomous to enirch data but challenge may be that kafka was performant due to it's low level of features. Will it remain performant with such additions ?

Hortonworks (Hadoop Distribution)



  • Databricks introduces Graph Frames ; it"s built on top of Spark Data Frames (and related APIs) and aim to ease manipulating graph data.


10 Feb 2016, 09:30

Around the Data - February 2016 - ELK, Kafka, Flink, Spark





  • Spark 2015 year in review : Databricks (core developper of Spark) made a review of 2015 : the 4 release, the features, how spark is used, etc. I was surprised to see that majority of spark usage was as a standadlone cluster and not in an Hadoop context.

13 Jan 2016, 09:30

Around the Data - January 2016 - Spark & Elasticsearch


  • Spark-TS is an addon to spark by Cloudera to ease working with time-series data : announce ; code ; website
  • SparkR is the ability to use Spark with R (programming language for statistical computing and graphics) ; a blog post in French to discover this API.
  • Spark 1.6 released (via); beyond bugfixes, improvements and performance:
    • New DataSet API (flagged as experimental) : it brings SparkSQL Engine on top of RDD ; Changes on the API between Dataset and Dataframes will be made post 1.6 release.
    • New models/algorythm for MLlib
  • Introducing Spark Datasets : the blog post presents Datasets and how they are more performant on structured data than RDD or Dataframes.




11 Nov 2015, 09:30

Around the Data - November 2015 - Diving in Kafka and ELK 2.0 releases


  • A 2 part blog post serie (part 1, part 2) to learn the genese of Kafka within LinkedIn before being opensourced and hosted by the Apache foundation ; always interesting to know the first use cases the software was built for and how it became the distributing messaging system we now know. It also stand current use cases for Kafka.
  • Putting Apache Kafka to use : a practical guide to building a streaming data platform (part 1 ; part 2) : the first part is about the shift to the event based approach and the definition of a streaming data platform. The second part is about about implementation best practices.
  • A kafka presentation from MixIt event (in French), which introduces Kafka and how it is used at EDF for the "Linky" device (energy monitor)

ElasticSearch / Logstash / Kibana

27 May 2015, 09:30

Around the Web - May 2015



  • A day at Devoxx France (in French) : a summary from Xebia about the Devoxx France conference (Java based but not only) and their findings.
  • Mix-IT Web was in Lyon in April, and the M6 Web tech team wrote a feedback in French - Day 1 - Day 2 ; it deals both with tech and agile topics.




  • The Apple Watch: User-Experience Appraisal : a review on how you app should behave (or not behave) on the new Apple IWatch ; transition with iPhone is also managed and the way to dealt with content and how you should manage your interactions.

Web performance

NoSQL, ElasticSearch

  • Elastic released a new (commercial) plugin for ElasticSearch caled "Watcher" and which aims to raise "alerts" when some events occured and according to some conditions, it may generate an action (email being sent, interaction with another system, etc).
  • M6 Web Tech team published a video (in French) about an introduction to Cassandra.


  • Indoor geolocation technology : article (in French) about indoor geolocation technology, describing and comparing Wifi vs NFC vs Beacon vs Magnetic field to provide geolocation.

29 Apr 2015, 09:30

Around the Web - April 2015

Ergonomy / User Experience

  • The best icon is a text label : a reminder that icon must be meaningful, with some examples of do's/don'ts and at the end that a text label may be more accurate than an icon.


Over the 4 years we have slowly moved away from device specific breakpoints in favour of content specific breakpoints, i.e. adding a breakpoint when the content is no longer easy to consume (whatever the device is).



  • M6Web Tech team published an article (English version ; French version) on how they mocked a backend application while they were building the frontend one and til the backend is fianlised. Beyond the tool involved, the most important point is the "interface agreement" in which backend and frontend teams agreed on how the coming API would work and be used to avoid bad surprises as much as possible at the end.
  • A visual guide to CSS3 Flexbox properties : title is self explainatory about what it is !
  • Introduction to Service Workers : Service Worker will allow offline experiences, periodic background syncs, push notifications and other things that would normally require a native application. Atricle introduce on how service workers are working and some current limitations ; if you are not familar with Javascript promises syntax, have a look at this article "Javascript Promises".

Responsive Web Design


Virtualisation (Docker)

[Edit 18/5, 21/5, 28/5,19/6,10/7 - Update docker tutorial list]

13 Feb 2014, 12:50

Elasticsearch 1.0 - distributed & RESTful search engine

ElasticSearch (ES) is a distributed search engine, RESTful and based on the (famous) library Apache Lucene ; if you use indexation services in your application, you may already use it or Solr which is also based on Lucene.

Aside ElasticSearch, the company behind the product also release 3 other opensource products linked with ES :

  • Kibana to produce dashboard and reports from ES and more widely to interact with data in ES.
  • Logstatsh combined with ES to analyse logs & events.
  • Marvel was also just released to monitor your ES cluster.

So with the 1.0 release, a lot of things have been included in ES, which had already a lot of interesting features (Documentation).

There are also bindings for PHP, Java, Perl, Python, Ruby, Javascript ; so you should be able to integrate it with your app easily.

Xebia (French consulting company) published some blog posts on ES in last december which shows you how to start with ES :

IIplayed with it a little bit and was quite impressed by its relevance. The only "issue" was to push content trough the REST Api as binary indexation is not native but there is file system river for that.

So if you need to index/retrieve content or manipulate data, I would recommend you having a look at ElasticSearch ecosystem.

ElasticSearch is also used in Graylog² and according to a colleague, on log analysis, it would be very relevant with the use of Gralylog Extended Log Format (GELF). If someone has an experience on Graylog² vs ES/Logstash/Kibana on this, I'm interested to have their opinions !