13 Jan 2016, 09:30

Around the Data - January 2016 - Spark & Elasticsearch


  • Spark-TS is an addon to spark by Cloudera to ease working with time-series data : announce ; code ; website
  • SparkR is the ability to use Spark with R (programming language for statistical computing and graphics) ; a blog post in French to discover this API.
  • Spark 1.6 released (via); beyond bugfixes, improvements and performance:
    • New DataSet API (flagged as experimental) : it brings SparkSQL Engine on top of RDD ; Changes on the API between Dataset and Dataframes will be made post 1.6 release.
    • New models/algorythm for MLlib
  • Introducing Spark Datasets : the blog post presents Datasets and how they are more performant on structured data than RDD or Dataframes.