09 Dec 2015, 09:30

Aroun the Data - December 2015
  • HDFS is the filesystem of choice in Hadoop, with the speed and economics ideal for building an active archive.
  • For online data serving applications, such as ad bidding platforms, HBase will continue to be ideal with its fast ability to handle updating data.
  • Kudu will handle the use cases that require a simultaneous combination of sequential and random reads and writes – such as for real-time fraud detection, online reporting of market data, or location-based targeting of loyalty offers.
  • Kudu is to be released as an apache projet and Impala should become an apache project too.
  • Kafka 0.9 is released :
    • Better securtity : SSL certificates, kerberos, wired encryption, improved permissions
    • "Kafka connect" to ease pushing/pulling data from/to kafka. Kafka will include a file connector, Confluent platform will have database & hadoop connector.
    • User defined quota to throttle connections & bandwith
    • New consumer
    • Confluent, the core contributor of Kafka releases their distribution Confluent Platform 2.0, with all features above and the schema registry which allow versionning at least of your message schemas (and compatibility for what I understood). This platform is open-source too with paid support if needed.
    • How to Build a Scalable ETL Pipeline with Kafka Connect : a sample to use Kafka Connect and Schema Registry to pull data from MySQL to HDFS/Hive via Kafka.