16 Sep 2015, 09:30

Around the Data - September 2015 - SQL, NoSQL, BigData and streaming

Having new activities around big-data topics from this month, I'll publish here also my findings on this topic.

So the "Around the Web" edition should be still be published on every last wednesday of the month and the "Around the Data" series should be published every wednesday in the middle of the month.

(No)SQL/Big Data

  • In a long interview splitted in two parts, "Where big data is headed and why spark is so big" and "Why NoSQL mattered and SQL still matters", the co-creator of AMPLab (the lab behind Spark at least and other big data tools) review what happend on the last decades with the NoSQL movement, how it enforced traditionnal database to evolve, how it enforced to change all the paradigms around data management and now all the big data evolution. And that SQL still matters :-) A long read but with insights and good points.
  • In the same kind of thoughts, there are some "big data" features in Postgres. Postgres has been used as datamart for a while (but not only) and can be used in some analytics / big data context. So you may start with Postgres first before going further (depend on your context)
  • With "Entretise din't have big data, they have bad data" and "You may not need big data after all", First, it insits on the issue of bad data management both in quantity and accuracy. Then, providing the right data is nice but it's about to provide the right data to the right person to take a decision is better (cf 7-Eleven Japan use case). It's also about clearly defining busines rules but also about more human being skills like coaching around data usage and culture shift / change management to adopt a culture of evidence-based decision making.

Streaming

  • Beyond batch : Streaming 101 : introduction to streaming principles, concepts and methods.
  • NoETL
    • Iin the same way NoSQL movment tends to answer to points that traditionnal database could not face to some extend, there is the same movment regarding ETL (Extract Transform and Load) tools. Instead of ETL, they promote CTP (Consume, Transform, Produce) concept.
    • Current "pitfalls" of ETL are identified as data duplication, possible data loss, costs, complexity and slowliness. Idea is also to remove this intermediary step of the ETL which makes the bridge between two systems.
    • New challenge would be to rely first on strong API to avoid the extract phase and data loss/duplication, then new processing tools to allow close to real-time processing and which will produce outcomes, without requiring this intermediary step represented by the ETL. I requires you swith from a batch logic (processed at a given time) to a flow mechanism.
    • Idea behind NoETL is interesting to review the way you manage and process your data. But it has strong requirements / pre-requisites. It requires your applications, systems and infrastructures being well structured and adapted to such needs.

18 Sep 2013, 09:30

MySQL/Postgres Roundup 18/9

On MySQL side, or should I say MariaDB one :

  • Google swaps out MySQL, moves to MariaDB : beyond the significative reference for MariaDB and the fact that Google will sustain his effort to patch MariaDB as they did for MySQL, the question that raised for me was : how long will Percona go with MySQL ? Percona's server was seen as an advanced version of MySQL with the inclusion of some patches (like Google's one for performance, etc) and for the tools they provide with/aside (like Percona toolkit or Xtrabackup). It does not seem they plan to make the switch so far...
  • Scaling your database via InnoDB table compression : where you can eliminate slow queries via innodb table compression. Constraints and limits are explained in the post.

On Postgres side :

More generally and even if it's a postgres example which is used, you should use UUIDs for your keys instead of traditionnal keys. Beyond unicity, if you are to use distributed systems, it would be one (or the only ?) way to avoid conflicts.

05 Dec 2012, 22:16

Web Giants patterns

Octo, a French consulting company, published on their blog (in Frenh) a series about patternes used by Web Giants (Google, Amazon, etc). Patterns are about organisation, methodology, development and technology. So it not only concerns IT department but can also interest business ones

A few examples :

  • Minimum Viable Product
  • Reccurent Beta
  • Feature flipping
  • Hardware commodity
  • Features teams - see also how it works at Spotify
  • NoSQL data bases
  • DevOps
  • Build vs Buy
  • Pizza teams
  • Cloud first
  • Etc

For each topic, the presentation principles are simple :

  • Pattern definition
  • How web giants use the pattern
  • How can I use the pattern in my firm / department / team

They also published (still in French) a book you can buy or download.

They organised a breakfast some weeks ago to introduced their book and the topic in general, you can watch the video and/or read the minutes. Of course in French

I would strongly recommend French readers to read this serie as it shows in an easy way how traditionnal IT patterns are evolving and what we can learn from such companies.

Definitely a must read for me.