27 Jun 2016, 16:23

Farewell !

Hi there,

The bad news : this blog as you knew it wll be inactive from now.

The good news : if you read French, you can follow me on CerenIT's blog. There is of course an RSS feed but mailchimp subscription has been removed.

It was and still is a good exercise as it allows to see trends over months, set dates & facts vs information streaming on twitter & co and of course helped me a lot to have a "technology radar" from which I could follow things over time. Explaining things is also a challenge, hope I did well on this and that you got some value from blog posts.

Except my family"s blog, seems it was my longest and more regular blogging experience so far: Almost 4 years and 110 blog posts !

See you maybe on the other side !

27 Apr 2016, 09:30

Web, Ops & Data - April 2016

Infrastructure

Container

  • Elastikube : an infrastructure management solution for Kubernetes ; seems a lighter version of Openshift.
  • Containers are not VM : Docker precise the difference between a container and a VM with the analogy between Flat and House. A houste (VM) is full featured whereas the flat share some resources with its building.
  • Docker announced Docker pour Windows/Mac (beta) ; it provides a more native experience of Docker on Mac/Windows machine. On Windows, it requires Windows 10 at least to benefit from the Hyper-V hypervisor. On this topic, the Hypriot team, in their Docker for Mac review and Docker for Windows review introduces the easter egg of this version: it's now possible to run ARM containers on OSX/Windows environments ; It will allow to build containers for Raspberry or IoT projects in a faster and easier way. I can't test it now but seems far les complex to use than Docker Toolbox.
  • Rancher, another container orchestration solution, announced first their support of Kubernetes and then the release of the version 1.0. Project to follow, even if the concept of environment may not allow an optimialistic allocation of ressources (each environment would have its own hosts) whereas Kubernets may allow this with their global namespace approach.
  • Kubernetes, the container orchestration solution (for Docker/Rocket containers), built by Google, was just released as version 1.2 with lots of improvements and especially a new GUI (for the one not at ease with CLI).
  • Kubernetes on ARM also released their 0.7 version ; an easy way to learn kubernetes on your Raspberry box. ;-) ; A rancher image is also available.
  • Docker en production : la bataille sanglante des orchestrateurs de conteneurs : To conclude, OCTO compared Docker Swarm, Kubernetes and Openshift and states Kubernetes as the current winner. From my point of view, as we can use Docker container with both solutions, it may be more intersting to use both : Docker swarm in a dev/lab oriented manner and Kubernetes more for operations. Kubernetes was initially built for ops and tries to be also developper friendy, with Docker doing the opposite.

Elasticsearch & friends

InfluxDB & Friends

Sysadmin

  • Teleport ; A SSH gateway which allows to access to servers via SSH or HTTPS with a Web GUI.

Security

  • VNCFail ; vous reprendrez bien un peu de non sécurité: The analysis of 4 billion IPV4 addresses from which 5 millions hosts have VPN (remote control software) opened and at the end 2.246 hosts for which you can access systems without any password. Systems as air condionnting, electricity factory, etc.

30 Mar 2016, 09:30

Around the Web - March 2016 - Javascript, Citus DB and Dependencies

Javascript

Postgres

Dependencies

16 Mar 2016, 09:30

Around the Data - March 2016 - Kafka Connect and Streams, Hortonworks HDP 2.4, Elasticsearch cluster sizing, Spark Graph Frames, Flink 1.0

Kafka

  • Kafka 0.9.0.1 and Confluent 2.0.1 were released (bugfix release)
  • A more detailled presentation about Kafka Connect.
  • Confluent released a custom version of what will be in Kafka 0.10 and it's called Kafka Streams ; it aims to manipulate data within Kafka without requiring any external system such as Spark & co. Interesting move to make it feature richer and more autonomous to enirch data but challenge may be that kafka was performant due to it's low level of features. Will it remain performant with such additions ?

Hortonworks (Hadoop Distribution)

Elasticsearch

Spark

  • Databricks introduces Graph Frames ; it"s built on top of Spark Data Frames (and related APIs) and aim to ease manipulating graph data.

Flink

24 Feb 2016, 09:30

Around the Web - February 2016 - MySQL, Docker, Security & Webapps/API

MySQL

Docker

  • Official images are to move from Ubuntu to Alpine : main arguements are about disk space saving (and so bandwith and time to launch a container) and security (lower surface of attack).
    • "Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox" according to the site
    • First, I was sceptical as it requires the whole ecosystem to move from ubuntu to alpine ; indeed, wether you like it or not, people are used to ubuntu/debian and other mainstream distribution and all packages we are used to have are not yet available in alpinelinux also. To be honest, main packages are available.
    • Then, a debian or whatever base image will still exist, be safe with that ; however, if you want to "hack" / inherit from a docker base image, you'll have to switch to Alpine.
    • Third, we could consider that once your docker host has the base image in cache, the ~180M size of base image is not an issue. But starting from 5M may be a good argument however.
    • Starting testing it on ARM device and especially Raspberry Pi, I'm quite pleased with its reactivity and packages available.
  • Some tips to reduce the size of your docker image and also understand how size and layers impacts your docker image. Following the instructions, I could reduce my influxdb-chronograf docker image by 70M approx (from 360 to 290M if I'm correct)

Security & API/Web App

 

10 Feb 2016, 09:30

Around the Data - February 2016 - ELK, Kafka, Flink, Spark

Elasticsearch

Kafka

Flink

Spark

  • Spark 2015 year in review : Databricks (core developper of Spark) made a review of 2015 : the 4 release, the features, how spark is used, etc. I was surprised to see that majority of spark usage was as a standadlone cluster and not in an Hadoop context.

27 Jan 2016, 09:30

Around the Web - January 2016 - Website obesity crisis, AngularJS & Postgres

Website Obesity

  • Website obesity crisis : transcript of a talk (video can be seen too from the link) about webperformance, bloated websites and how fat sites become for almost no real value for end users. Long but funny and instructive. It makes you think on how complex, fat and bloated the web is nowadays, in terms of tooling, architecture, code and content.

AngularJS

Postgres

  • The most advanced opensource database of the world, ie Postgres for short, was released as 9.5 version (FR / EN) ; it brings the long expected "UPSERT" features, Row level security and some big-data features (improved index, faster sorts, connection to Hadoop/Cassandra via FDW, etc)
  • In French a deeper view of the 9.5 release (part 1, part 2, part 3) to better understand what contains this release.

 

13 Jan 2016, 09:30

Around the Data - January 2016 - Spark & Elasticsearch

Spark

  • Spark-TS is an addon to spark by Cloudera to ease working with time-series data : announce ; code ; website
  • SparkR is the ability to use Spark with R (programming language for statistical computing and graphics) ; a blog post in French to discover this API.
  • Spark 1.6 released (via); beyond bugfixes, improvements and performance:
    • New DataSet API (flagged as experimental) : it brings SparkSQL Engine on top of RDD ; Changes on the API between Dataset and Dataframes will be made post 1.6 release.
    • New models/algorythm for MLlib
  • Introducing Spark Datasets : the blog post presents Datasets and how they are more performant on structured data than RDD or Dataframes.

Elasticsearch

 

 

23 Dec 2015, 09:30

Around the Web - December 2015 - AngularJS & AMP

AngularJS

AMP

Google released AMP (Accelerated Mobile Page) which is a subset of HTML to have the minimal formatting for content but possibly with ads and analytics embeded. It aims to speed up the web as Google defined it :

Instant. Everywhere.

For many, reading on the mobile web is a slow, clunky and frustrating experience - but it doesn’t have to be that way. The Accelerated Mobile Pages (AMP) Project is an open source initiative that embodies the vision that publishers can create mobile optimized content once and have it load instantly everywhere.

  • Online advertising - current situation (in French) : A summary to have the whole context on online advertising, the decrease of ads revenues, the rise of ad blockers and the possible alternatives, such as AMP but not only.
  • AMP and Responsive Web Design  : a very interesting article which describe what AMP is and isn't ; in a few words:
    • AMP is more an answer to Facebook Instant's article and apple news
    • APM is strictly performance oriented but not SEO or accessibility oriented
    • It does not threat RWD even if it can show some ways of improvments
  • Why AMP is fast : the article details how in terms of techniques, code and platform, APM is fast. You can take some best practices from it to apply to your websites (when relevant)

09 Dec 2015, 09:30

Aroun the Data - December 2015
  • HDFS is the filesystem of choice in Hadoop, with the speed and economics ideal for building an active archive.
  • For online data serving applications, such as ad bidding platforms, HBase will continue to be ideal with its fast ability to handle updating data.
  • Kudu will handle the use cases that require a simultaneous combination of sequential and random reads and writes – such as for real-time fraud detection, online reporting of market data, or location-based targeting of loyalty offers.
  • Kudu is to be released as an apache projet and Impala should become an apache project too.
  • Kafka 0.9 is released :
    • Better securtity : SSL certificates, kerberos, wired encryption, improved permissions
    • "Kafka connect" to ease pushing/pulling data from/to kafka. Kafka will include a file connector, Confluent platform will have database & hadoop connector.
    • User defined quota to throttle connections & bandwith
    • New consumer
    • Confluent, the core contributor of Kafka releases their distribution Confluent Platform 2.0, with all features above and the schema registry which allow versionning at least of your message schemas (and compatibility for what I understood). This platform is open-source too with paid support if needed.
    • How to Build a Scalable ETL Pipeline with Kafka Connect : a sample to use Kafka Connect and Schema Registry to pull data from MySQL to HDFS/Hive via Kafka.